monchickey opened a new issue, #16987:
URL: https://github.com/apache/dolphinscheduler/issues/16987

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   DolphinScheduler version: 3.2.2
   Deployment: pseudo-cluster
   Spark is deployed in a standalone cluster, version: 3.5.4
   Resource files are stored using MinIO S3
   The configuration files involve `api-server/conf/common.properties` and 
`worker-server/conf/common.properties`, The main changes are as follows:
   ```
   resource.storage.type=S3
   resource.storage.upload.base.path=/dolphinscheduler
   resource.aws.access.key.id=<minio access key>
   resource.aws.secret.access.key=<minio secret key>
   resource.aws.region=cn-north-1
   resource.aws.s3.bucket.name=dolphinscheduler
   resource.aws.s3.endpoint=http://<ip>:9000
   resource.hdfs.root.user=root
   resource.hdfs.fs.defaultFS=s3a://dolphinscheduler
   ```
   Keep the rest of the configuration as default, After starting the service, 
the jar file can be uploaded normally.
   Then select the SPARK component in the workflow, select the Jar package 
uploaded to MinIO, and select `cluster` as the deployment method.
   Then run the workflow instance, and the output log attachment is as follows:
   
   
[1737699046243.log](https://github.com/user-attachments/files/18531446/1737699046243.log)
   
   The important error information is:
   ```
   [INFO] 2025-01-24 13:53:34.674 +0800 - *********************************  
Execute task instance  *************************************
   [INFO] 2025-01-24 13:53:34.675 +0800 - 
***********************************************************************************************
   [INFO] 2025-01-24 13:53:34.677 +0800 - Final Shell file is: 
   [INFO] 2025-01-24 13:53:34.677 +0800 - ****************************** Script 
Content *****************************************************************
   [INFO] 2025-01-24 13:53:34.677 +0800 - #!/bin/bash
   BASEDIR=$(cd `dirname $0`; pwd)
   cd $BASEDIR
   export SPARK_HOME=/opt/spark-3.5.4-bin-hadoop3
   ${SPARK_HOME}/bin/spark-submit --master spark://192.168.11.17:7077 
--deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi --conf 
spark.driver.cores=1 --conf spark.driver.memory=512M --conf 
spark.executor.instances=2 --conf spark.executor.cores=2 --conf 
spark.executor.memory=2G 
/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar
   [INFO] 2025-01-24 13:53:34.678 +0800 - ****************************** Script 
Content *****************************************************************
   [INFO] 2025-01-24 13:53:34.678 +0800 - Executing shell command : sudo -u 
default -i 
/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/6_6.sh
   [INFO] 2025-01-24 13:53:34.687 +0800 - process start, process id is: 172698
   [INFO] 2025-01-24 13:53:37.688 +0800 -  -> 
        25/01/24 13:53:37 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
        25/01/24 13:53:37 INFO SecurityManager: Changing view acls to: default
        25/01/24 13:53:37 INFO SecurityManager: Changing modify acls to: default
        25/01/24 13:53:37 INFO SecurityManager: Changing view acls groups to: 
        25/01/24 13:53:37 INFO SecurityManager: Changing modify acls groups to: 
        25/01/24 13:53:37 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: default; groups with 
view permissions: EMPTY; users with modify permissions: default; groups with 
modify permissions: EMPTY
   [INFO] 2025-01-24 13:53:38.691 +0800 -  -> 
        25/01/24 13:53:37 INFO Utils: Successfully started service 
'driverClient' on port 39639.
        25/01/24 13:53:37 INFO TransportClientFactory: Successfully created 
connection to /192.168.11.17:7077 after 57 ms (0 ms spent in bootstraps)
        25/01/24 13:53:38 INFO ClientEndpoint: ... waiting before polling 
master for driver state
        25/01/24 13:53:38 INFO ClientEndpoint: Driver successfully submitted as 
driver-20250124135338-0056
   [INFO] 2025-01-24 13:53:43.693 +0800 -  -> 
        25/01/24 13:53:43 INFO ClientEndpoint: State of 
driver-20250124135338-0056 is ERROR
        25/01/24 13:53:43 ERROR ClientEndpoint: Exception from cluster was: 
java.nio.file.NoSuchFileException: 
/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar
        java.nio.file.NoSuchFileException: 
/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar
                at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
                at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
                at 
sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
                at java.nio.file.Files.copy(Files.java:1274)
                at org.apache.spark.util.Utils$.copyRecursive(Utils.scala:681)
                at org.apache.spark.util.Utils$.copyFile(Utils.scala:652)
                at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:725)
                at org.apache.spark.util.Utils$.fetchFile(Utils.scala:467)
                at 
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162)
                at 
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:179)
                at 
org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)
        25/01/24 13:53:43 INFO ShutdownHookManager: Shutdown hook called
        25/01/24 13:53:43 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-2af4f41d-c583-4698-9d8e-546a656bcf17
   [INFO] 2025-01-24 13:53:43.695 +0800 - process has exited. execute 
path:/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6,
 processId:172698 ,exitStatusCode:255 ,processWaitForStatus:true 
,processExitValue:255
   [INFO] 2025-01-24 13:53:43.697 +0800 - Start finding appId in 
/opt/apache-dolphinscheduler-3.2.2-bin/worker-server/logs/20250124/131329769571008/2/6/6.log,
 fetch way: log 
   [INFO] 2025-01-24 13:53:43.698 +0800 - 
   
***********************************************************************************************
   [INFO] 2025-01-24 13:53:43.699 +0800 - *********************************  
Finalize task instance  ************************************
   [INFO] 2025-01-24 13:53:43.699 +0800 - 
***********************************************************************************************
   ```
   
   From the error message, we can see that although the jar package on MinIO 
was selected when configuring the workflow, DolphinScheduler still used the 
local temporary directory as a parameter during runtime, which caused the Spark 
Driver to fail to read the package and cause an error.
   
   ### What you expected to happen
   
   Tasks can be submitted and run normally, 
   
   ### How to reproduce
   
   You can reproduce it by following the steps above.
   
   ### Anything else
   
   The above problem will occur as long as DolphinScheduler and Spark Driver 
are not running on the same node.
   
   ### Version
   
   3.2.x
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to