[
https://issues.apache.org/jira/browse/HIVE-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303379#comment-15303379
]
Rui Li commented on HIVE-13376:
-------------------------------
[~xuefuz], [~szehon] - I just did more tests about this and want to correct
some of my previous comments:
# In yarn-cluster mode, {{SparkSubmit}} runs the {{Client}}. The Client keeps
checking the app state and printing the logs. On hive side, we read from
SparkSubmit's input and err streams and print to hive log.
# In yarn-client mode, {{SparkSubmit}} runs our {{RemoteDriver}}. RemoteDirver
waits for the app to start running and then serves the job requests from hive.
It doesn't report the app state after that.
# The verbose logging only happens with yarn-cluster mode.
# The long interval only affects yarn-client mode.
# To avoid the state reports in yarn-cluster mode, we can change log level
(e.g. WARN instead of INFO), or we can set
{{spark.yarn.submit.waitAppCompletion=false}} and {{SparkSubmit}} will
terminate after it submits the app to RM.
I'd prefer disabling {{spark.yarn.submit.waitAppCompletion}}, if it doesn't
cause any other trouble.
> HoS emits too many logs with application state
> ----------------------------------------------
>
> Key: HIVE-13376
> URL: https://issues.apache.org/jira/browse/HIVE-13376
> Project: Hive
> Issue Type: Improvement
> Components: Spark
> Reporter: Szehon Ho
> Assignee: Szehon Ho
> Fix For: 2.1.0
>
> Attachments: HIVE-13376.2.patch, HIVE-13376.patch
>
>
> The logs get flooded with something like:
> > Mar 28, 3:12:21.851 PM INFO
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report
> > for application_1458679386200_0161 (state: RUNNING)
> > Mar 28, 3:12:21.912 PM INFO
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report
> > for application_1458679386200_0149 (state: RUNNING)
> > Mar 28, 3:12:22.853 PM INFO
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report
> > for application_1458679386200_0161 (state: RUNNING)
> > Mar 28, 3:12:22.913 PM INFO
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report
> > for application_1458679386200_0149 (state: RUNNING)
> > Mar 28, 3:12:23.855 PM INFO
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:23 INFO yarn.Client: Application report
> > for application_1458679386200_0161 (state: RUNNING)
> While this is good information, it is a bit much.
> Seems like SparkJobMonitor hard-codes its interval to 1 second. It should be
> higher and perhaps made configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)