[jira] [Commented] (KYLIN-4895) change spark deploy mode of kylin4.0 engine from local to cluster

ASF GitHub Bot (Jira) Fri, 18 Jun 2021 14:14:22 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365722#comment-17365722
 ]


ASF GitHub Bot commented on KYLIN-4895:
---------------------------------------

zhangayqian edited a comment on pull request #1568:
URL: https://github.com/apache/kylin/pull/1568#issuecomment-863869542


   When testing this patch, I encountered the following error in the first step 
of the build job:
   <img width="1303" alt="image" 
src="https://user-images.githubusercontent.com/31064237/122533007-ab50e300-d053-11eb-8192-5fdef3c699b3.png";>
   
   `Detect Resource` only support `spark.master=local`, at this time, 
`spark.submit.deployMode` cannot be set to `cluster`.
   
   I tried to add `spark.submit.deployMode` to `excludedSparkConf` in class 
`NSparkLocalStep` and retested. Then the first step was successful.
   
   But at `Build cube with spark` step, the following error was reported:
   ```
   throwable : java.lang.RuntimeException: Error execute 
org.apache.kylin.engine.spark.job.CubeBuildJob
        at 
org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:101)
        at 
org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
        at org.apache.hadoop.fs.Path.<init>(Path.java:134)
        at org.apache.hadoop.fs.Path.<init>(Path.java:93)
        at 
org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:383)
        at 
org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:477)
        at 
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:496)
        at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:865)
        at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
        at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:188)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
        at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
        at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
        at scala.Option.getOrElse(Option.scala:121)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
        at 
org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:303)
        at 
org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:98)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> change spark deploy mode of kylin4.0 engine from local to cluster
> -----------------------------------------------------------------
>
>                 Key: KYLIN-4895
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4895
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Job Engine
>    Affects Versions: v4.0.0-alpha
>            Reporter: tianhui
>            Assignee: tianhui
>            Priority: Major
>             Fix For: v4.0.0
>
>
>     In cloud native environment, the memory of pod is quite limited. But the 
> spark Driver can use a huge amount of memory, in job engine pod, which is 
> quite difficult to manage when there are more and more projects. 
>     So it's better to put spark driver on yarn, and kylin only maintain its 
> status.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4895) change spark deploy mode of kylin4.0 engine from local to cluster

Reply via email to