Hi all,

I want to highlight a problem that we face here at Cloudera and start a
discussion on how to go about solving it.

*Problem Statement:*
Our customers reach out to us when they face problems in their Spark
Applications. Those problems can be related to Spark, environment issues,
their own code or something else altogether. A lot of times these customers
run their Spark Applications in Yarn Client mode, which as we all know,
uses a ConsoleAppender to print logs to the console. These customers
usually send their Yarn logs to us to troubleshoot. As you may have
figured, these logs do not contain driver logs and makes it difficult for
us to troubleshoot the issue. In that scenario our customers end up running
the application again, piping the output to a log file or using a local log
appender and then sending over that file.

I believe that there are other users in the community who also face similar
problem, where the central team managing Spark clusters face difficulty in
helping the end users because they ran their application in shell or yarn
client mode (I am not sure what is the equivalent in Mesos).

Additionally, there may be teams who want to capture all these logs so that
they can be analyzed at some later point in time and the fact that driver
logs are not a part of Yarn Logs causes them to capture only partial logs
or makes it difficult to capture all the logs.

*Proposed Solution:*
One "low touch" approach will be to create an ApplicationListener which
listens for Application Start and Application End events. On Application
Start, this listener will append a Log Appender which writes to a local or
remote (eg:hdfs) log file in an application specific directory and moves
this to Yarn's Remote Application Dir (or equivalent Mesos Dir) on
application end. This way the logs will be available as part of Yarn Logs.

I am also interested in hearing about other ideas that the community may
have about this. Or if someone has already solved this problem, then I
would like them to contribute their solution to the community.

Thanks,
Ankur

Reply via email to