[
https://issues.apache.org/jira/browse/KYLIN-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236073#comment-17236073
]
ASF GitHub Bot commented on KYLIN-4813:
---------------------------------------
hit-lacus edited a comment on pull request #1481:
URL: https://github.com/apache/kylin/pull/1481#issuecomment-731095920
## Cause Analysis
#### spark-submit command in NSparkExecutable
```sh
2020-11-20 18:22:10,342 INFO [Scheduler 1960610874 Job
79331eef-a64f-411a-a8b5-f8696d301438-104] job.NSparkExecutable:41 : cmd:
2020-11-20 18:22:10,342 INFO [Scheduler 1960610874 Job
79331eef-a64f-411a-a8b5-f8696d301438-104] job.NSparkExecutable:41 : export
HADOOP_CONF_DIR=/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/hadoop_conf
&&
/root/open-source/deploy/kylin-instances/spark-2.4.6-bin-hadoop2.7/bin/spark-submit
--class org.apache.kylin.engine.spark.application.SparkEntry
--conf 'spark.executor.instances=1'
--conf 'spark.yarn.queue=default'
--conf 'spark.history.fs.logDirectory=hdfs:///kylin/spark-history'
--conf 'spark.master=yarn'
--conf 'spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8
-Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties
-Dlog4j.debug -Dkylin.hdfs.working.dir=${kylin.env.hdfs-working-dir}
-Dkylin.metadata.identifier=${kylin.metadata.url.identifier}
-Dkylin.spark.category=job -Dkylin.spark.project=${job.project}
-Dkylin.spark.identifier=${job.id} -Dkylin.spark.jobName=${job.stepId}
-Duser.timezone=${user.timezone}'
--conf 'spark.hadoop.yarn.timeline-service.enabled=false'
--conf 'spark.driver.cores=1' --conf 'spark.executor.memory=4G'
--conf 'spark.eventLog.enabled=true'
--conf 'spark.eventLog.dir=hdfs:///kylin/spark-history'
--conf 'spark.executor.cores=1'
--conf 'spark.executor.memoryOverhead=1024M'
--conf 'spark.driver.memory=1G'
--conf 'spark.shuffle.service.enabled=true'
--conf
'spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/conf/spark-driver-log4j.properties
-Dkylin.kerberos.enabled=false
-Dkylin.hdfs.working.dir=hdfs://cdh-master:8020/regression_testing/400alpha/xxyu/
-Dspark.driver.log4j.appender.hdfs.File=hdfs://cdh-master:8020/regression_testing/400alpha/xxyu/learn_kylin/spark_logs/driver/79331eef-a64f-411a-a8b5-f8696d301438-01/execute_output.json.1605867729760.log
-Dspark.driver.rest.server.ip=10.1.3.90 -Dspark.driver.rest.server.port=7070
-Dspark.driver.param.taskId=79331eef-a64f-411a-a8b5-f8696d301438-01
-Dspark.driver.local.logDir=/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/logs/spark'
--conf 'spark.executor.extraClassPath=kylin-parquet-job-4.0.0-SNAPSHOT.jar'
--conf
'spark.driver.extraClassPath=/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/lib/kylin-parquet-job-4.0.0-SNAPSHOT.jar'
--files
/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/conf/spark-executor-log4j.properties
--name job_step_79331eef-a64f-411a-a8b5-f8696d301438-01
--jars
/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/lib/kylin-parquet-job-4.0.0-SNAPSHOT.jar
/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/lib/kylin-parquet-job-4.0.0-SNAPSHOT.jar
-className org.apache.kylin.engine.spark.job.CubeBuildJob
/root/lib/kylin-dist/apache-kylin-4.0.0-SNAPSHOT-bin/tomcat/temp/segmentIds2791542431972229325
```
#### Output of executor log
Following var is empty :
- metadataIdentifier
- project
- jobName
- hdfsWorkingDir
```sh
[root@cdh-master kylin]# yarn logs -applicationId
application_1589169585068_30012
20/11/20 18:29:25 INFO client.RMProxy: Connecting to ResourceManager at
cdh-master/10.1.3.90:8032
Container: container_1589169585068_30012_01_000002 on cdh-worker-1_8041
=========================================================================
LogType:stderr
Log Upload Time:Fri Nov 20 18:23:56 +0800 2020
LogLength:3373769
Log Contents:
log4j:WARN No such property [rollingPeriod] in
org.apache.kylin.engine.spark.common.logging.SparkExecutorHdfsAppender.
log4j:WARN SparkExecutorHdfsLogAppender starting ...
log4j:WARN hdfsWorkingDir ->
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/yarn/nm/usercache/root/filecache/835102/__spark_libs__3815910076379456523.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN metadataIdentifier ->
log4j:WARN category -> job
log4j:WARN identifier -> application_1589169585068_30012
log4j:WARN project ->
log4j:WARN jobName ->
log4j:WARN SparkExecutorHdfsLogAppender started ...
```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Refine spark logger for Kylin 4 build engine
> --------------------------------------------
>
> Key: KYLIN-4813
> URL: https://issues.apache.org/jira/browse/KYLIN-4813
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v4.0.0-alpha
> Reporter: Xiaoxiang Yu
> Assignee: Yaqian Zhang
> Priority: Major
> Fix For: v4.0.0-beta
>
>
> - Separate spark log from kylin log
> - Store driver/executor log into HDFS.
> - Provided a API to view driver log.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)