Re: Persisting driver logs in yarn client mode (SPARK-25118)

Marco Gaido Wed, 22 Aug 2018 01:36:36 -0700

I agree with Saisai. You can also configure log4j to append anywhere else
other than the console. Many companies have their system for collecting and
monitoring logs and they just customize the log4j configuration. I am not
sure how needed this change would be.


Thanks,
Marco

Il giorno mer 22 ago 2018 alle ore 04:31 Saisai Shao <sai.sai.s...@gmail.com>
ha scritto:

> One issue I can think of is that this "moving the driver log" in the
> application end is quite time-consuming, which will significantly delay the
> shutdown. We already suffered such "rename" problem for event log on object
> store, the moving of driver log will make the problem severe.
>
> For a vanilla Spark on yarn client application, I think user could
> redirect the console outputs to log and provides both driver log and yarn
> application log to the customers, this seems not a big overhead.
>
> Just my two cents.
>
> Thanks
> Saisai
>
> Ankur Gupta <ankur.gu...@cloudera.com.invalid> 于2018年8月22日周三 上午5:19写道：
>
>> Hi all,
>>
>> I want to highlight a problem that we face here at Cloudera and start a
>> discussion on how to go about solving it.
>>
>> *Problem Statement:*
>> Our customers reach out to us when they face problems in their Spark
>> Applications. Those problems can be related to Spark, environment issues,
>> their own code or something else altogether. A lot of times these customers
>> run their Spark Applications in Yarn Client mode, which as we all know,
>> uses a ConsoleAppender to print logs to the console. These customers
>> usually send their Yarn logs to us to troubleshoot. As you may have
>> figured, these logs do not contain driver logs and makes it difficult for
>> us to troubleshoot the issue. In that scenario our customers end up running
>> the application again, piping the output to a log file or using a local log
>> appender and then sending over that file.
>>
>> I believe that there are other users in the community who also face
>> similar problem, where the central team managing Spark clusters face
>> difficulty in helping the end users because they ran their application in
>> shell or yarn client mode (I am not sure what is the equivalent in Mesos).
>>
>> Additionally, there may be teams who want to capture all these logs so
>> that they can be analyzed at some later point in time and the fact that
>> driver logs are not a part of Yarn Logs causes them to capture only partial
>> logs or makes it difficult to capture all the logs.
>>
>> *Proposed Solution:*
>> One "low touch" approach will be to create an ApplicationListener which
>> listens for Application Start and Application End events. On Application
>> Start, this listener will append a Log Appender which writes to a local or
>> remote (eg:hdfs) log file in an application specific directory and moves
>> this to Yarn's Remote Application Dir (or equivalent Mesos Dir) on
>> application end. This way the logs will be available as part of Yarn Logs.
>>
>> I am also interested in hearing about other ideas that the community may
>> have about this. Or if someone has already solved this problem, then I
>> would like them to contribute their solution to the community.
>>
>> Thanks,
>> Ankur
>>
>

Re: Persisting driver logs in yarn client mode (SPARK-25118)

Reply via email to