One issue I can think of is that this "moving the driver log" in the
application end is quite time-consuming, which will significantly delay the
shutdown. We already suffered such "rename" problem for event log on object
store, the moving of driver log will make the problem severe.

For a vanilla Spark on yarn client application, I think user could redirect
the console outputs to log and provides both driver log and yarn
application log to the customers, this seems not a big overhead.

Just my two cents.

Thanks
Saisai

Ankur Gupta <ankur.gu...@cloudera.com.invalid> 于2018年8月22日周三 上午5:19写道:

> Hi all,
>
> I want to highlight a problem that we face here at Cloudera and start a
> discussion on how to go about solving it.
>
> *Problem Statement:*
> Our customers reach out to us when they face problems in their Spark
> Applications. Those problems can be related to Spark, environment issues,
> their own code or something else altogether. A lot of times these customers
> run their Spark Applications in Yarn Client mode, which as we all know,
> uses a ConsoleAppender to print logs to the console. These customers
> usually send their Yarn logs to us to troubleshoot. As you may have
> figured, these logs do not contain driver logs and makes it difficult for
> us to troubleshoot the issue. In that scenario our customers end up running
> the application again, piping the output to a log file or using a local log
> appender and then sending over that file.
>
> I believe that there are other users in the community who also face
> similar problem, where the central team managing Spark clusters face
> difficulty in helping the end users because they ran their application in
> shell or yarn client mode (I am not sure what is the equivalent in Mesos).
>
> Additionally, there may be teams who want to capture all these logs so
> that they can be analyzed at some later point in time and the fact that
> driver logs are not a part of Yarn Logs causes them to capture only partial
> logs or makes it difficult to capture all the logs.
>
> *Proposed Solution:*
> One "low touch" approach will be to create an ApplicationListener which
> listens for Application Start and Application End events. On Application
> Start, this listener will append a Log Appender which writes to a local or
> remote (eg:hdfs) log file in an application specific directory and moves
> this to Yarn's Remote Application Dir (or equivalent Mesos Dir) on
> application end. This way the logs will be available as part of Yarn Logs.
>
> I am also interested in hearing about other ideas that the community may
> have about this. Or if someone has already solved this problem, then I
> would like them to contribute their solution to the community.
>
> Thanks,
> Ankur
>

Reply via email to