Hi Kudryavtsev,
Here's what I am doing as a common practice and reference, I don't want to say
it is best practice since it requires a lot of customer experience and
feedback, but from a development and operating stand point, it will be great to
separate the YARN container logs with the Spark logs.
Event Log - Use HistoryServer to take a look at the workflow, overall resource
usage, etc for the Job.
Spark Log - Provide readable info on settings and configuration, and is covered
by the event logs. You can customize this in the 'conf' folder with your own
log4j.properties file. This won't be picked up by your YARN container since
your Hadoop may be referring to a different log4j file somewhere else.
Stderr/Stdout log - This is actually picked up by the YARN container and you
won't be able to override this unless you override the one in the resource
folder (yarn/common/src/main/resources/log4j-spark-container.properties) during
the build process and include it in your build (JAR file).
One thing I haven't tried yet is to separate that resource file into a separate
JAR, and include it in the ext jar options on HDFS to suppress the log. This is
more of a exploiting the CLASSPATH search behavior to override YARN log4j
settings without building JARs to include the YARN container log4j settings, I
don't know if this is a good practice though. Just some ideas that gives ppl
flexibility, but probably not a good practice.
Anyone else have ideas? thoughts?
From: kudryavtsev.konstan...@gmail.com
Subject: Spark logging strategy on YARN
Date: Thu, 3 Jul 2014 22:26:48 +0300
To: user@spark.apache.org
Hi all,
Could you please share your the best practices on writing logs in Spark? I’m
running it on YARN, so when I check logs I’m bit confused…
Currently, I’m writing System.err.println to put a message in log and access
it via YARN history server. But, I don’t like this way… I’d like to use
log4j/slf4j and write them to more concrete place… any practices?
Thank you in advance