Hi all, I want to highlight a problem that we face here at Cloudera and start a discussion on how to go about solving it.
*Problem Statement:* Our customers reach out to us when they face problems in their Spark Applications. Those problems can be related to Spark, environment issues, their own code or something else altogether. A lot of times these customers run their Spark Applications in Yarn Client mode, which as we all know, uses a ConsoleAppender to print logs to the console. These customers usually send their Yarn logs to us to troubleshoot. As you may have figured, these logs do not contain driver logs and makes it difficult for us to troubleshoot the issue. In that scenario our customers end up running the application again, piping the output to a log file or using a local log appender and then sending over that file. I believe that there are other users in the community who also face similar problem, where the central team managing Spark clusters face difficulty in helping the end users because they ran their application in shell or yarn client mode (I am not sure what is the equivalent in Mesos). Additionally, there may be teams who want to capture all these logs so that they can be analyzed at some later point in time and the fact that driver logs are not a part of Yarn Logs causes them to capture only partial logs or makes it difficult to capture all the logs. *Proposed Solution:* One "low touch" approach will be to create an ApplicationListener which listens for Application Start and Application End events. On Application Start, this listener will append a Log Appender which writes to a local or remote (eg:hdfs) log file in an application specific directory and moves this to Yarn's Remote Application Dir (or equivalent Mesos Dir) on application end. This way the logs will be available as part of Yarn Logs. I am also interested in hearing about other ideas that the community may have about this. Or if someone has already solved this problem, then I would like them to contribute their solution to the community. Thanks, Ankur