[GitHub] [dolphinscheduler] gcnyin opened a new issue, #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode

GitBox Thu, 04 Aug 2022 05:25:12 -0700


gcnyin opened a new issue, #11304:
URL: https://github.com/apache/dolphinscheduler/issues/11304


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   I'v already create the dolphin-scheduler in k8s using helm, and have added 
the spark in `/opt/soft/spark1` and `/opt/soft/spark2`.
   
   Spark version: 3.2.2.
   
   K8s version: 1.23.
   
   I submitted a spark-sql task and started, it threw an error. But I can run 
it correctly in my local machine.
   
   It looks like dolphin-scheduler doesn't read correctly `SPARK1_HOME` env 
variable.
   
   ```
   [LOG-PATH]: /opt/dolphinscheduler/logs/20220802/6402332966880_2-1-11.log, 
[HOST]:  
Host{address='dolphinscheduler-worker-2.dolphinscheduler-worker-headless:1234', 
ip='dolphinscheduler-worker-2.dolphinscheduler-worker-headless', port=1234}
   [INFO] 2022-08-02 22:48:15.754 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[69] - spark task 
params {"localParams":[],"rawScript":"create table ods_space\n(\n    
project_uuid binary,\n    space_uuid binary,\n    type string,\n    is_deleted 
boolean\n) using jdbc options (\n    dbtable = \"xxx\",\n    driver = 
\"com.mysql.cj.jdbc.Driver\",\n    url = \"jdbc:mysql://xxx:3306\",\n    user = 
\"xxx\",\n    password = \"xxx\"\n);\n\ncreate table ods_space\n(\n    
project_uuid binary,\n    date date,\n    count int\n) using jdbc options (\n   
 dbtable = \"xxx\",\n    driver = \"com.mysql.cj.jdbc.Driver\",\n    url = 
\"jdbc:mysql://xxx:3306\",\n    user = \"xxx\",\n    password = 
\"xxx\"\n);\n\ninsert into ods_space\nfrom (select project_uuid,\n            
count(1)\n    from ods_space\n    where type = 'room'\n    group by 
project_uuid);\n","resourceList":[],"programType":"SQL","mainClass":"","deployMode"
 
:"local","appName":"sta_analysis_job","sparkVersion":"SPARK2","driverCores":1,"driverMemory":"512M","numExecutors":2,"executorMemory":"2G","executorCores":2}
   [INFO] 2022-08-02 22:48:15.770 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[240] - raw script : 
create table ods_space
   (
       project_uuid binary,
       space_uuid binary,
       type string
   ) using jdbc options (
       dbtable = "xxx",
       driver = "com.mysql.cj.jdbc.Driver",
       url = "jdbc:mysql://xxx:3306",
       user = "xxx",
       password = "xxx"
   );
   
   create table ods_space
   (
       project_uuid binary,
       date date,
       count int
   ) using jdbc options (
       dbtable = "xxx",
       driver = "com.mysql.cj.jdbc.Driver",
       url = "jdbc:mysql://xxx:3306",
       user = "xxx",
       password = "xxx"
   );
   
   insert into ods_space
   from (select project_uuid,
               count(1)
       from ods_space
       where type = 'room'
       group by project_uuid);
   
   [INFO] 2022-08-02 22:48:15.771 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[241] - task execute 
path : /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11
   [INFO] 2022-08-02 22:48:15.776 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[130] - spark task 
command: ${SPARK_HOME2}/bin/spark-sql --master local --driver-cores 1 
--driver-memory 512M --num-executors 2 --executor-cores 2 --executor-memory 2G 
--name sta_analysis_job -f 
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11_node.sql
   [INFO] 2022-08-02 22:48:15.777 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[85] - tenantCode 
user:tenant-01, task dir:1_11
   [INFO] 2022-08-02 22:48:15.777 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[90] - create command 
file:/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command
   [INFO] 2022-08-02 22:48:15.777 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[116] - command : 
#!/bin/sh
   BASEDIR=$(cd `dirname $0`; pwd)
   cd $BASEDIR
   source /opt/dolphinscheduler/conf/dolphinscheduler_env.sh
   ${SPARK_HOME2}/bin/spark-sql --master local --driver-cores 1 --driver-memory 
512M --num-executors 2 --executor-cores 2 --executor-memory 2G --name 
sta_analysis_job -f 
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11_node.sql
   [INFO] 2022-08-02 22:48:15.802 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[290] - task run 
command: sudo -u tenant-01 sh 
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command
   [INFO] 2022-08-02 22:48:15.805 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[181] - process start, 
process id is: 201
   [INFO] 2022-08-02 22:48:15.816 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[205] - process has 
exited, execute 
path:/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11, 
processId:201 ,exitStatusCode:127 ,processWaitForStatus:true 
,processExitValue:127
   [INFO] 2022-08-02 22:48:16.805 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[63] -  -> 
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command:
 4: 
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command:
 source: not found
        
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command:
 5: 
/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command:
 /bin/spark-sql: not found
   [INFO] 2022-08-02 22:48:16.808 +0800 
[taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[57] - FINALIZE_SESSION
   ```
   
   I used soft-link to link `${SPARK_HOME2}/bin/spark-sql` to `/bin/spark-sql`, 
can fix it, but then, threw another similar error:
   
   ```
   ${SPARK_HOME2}/bin/spark-submmit not found
   ```
   
   ### What you expected to happen
   
   Running the spark-sql task correctly.
   
   ### How to reproduce
   
   Create any spark-sql task and run.
   
   ### Anything else
   
   none
   
   ### Version
   
   3.0.0-beta-2
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [dolphinscheduler] gcnyin opened a new issue, #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode

Reply via email to