zuzuviewer opened a new issue, #12091: URL: https://github.com/apache/dolphinscheduler/issues/12091
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened [LOG-PATH]: /opt/dolphinscheduler/logs/20220921/6885727510752_21-95-340.log, [HOST]: Host{address='172.21.0.5:1234', ip='172.21.0.5', port=1234} [INFO] 2022-09-21 09:45:07.696 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[83] - data quality task params {"localParams":[],"resourceList":[],"ruleId":1,"ruleInputParameter":{"check_type":"0","comparison_type":7,"failure_strategy":"0","operator":"5","src_connector_type":0,"src_datasource_id":1,"src_field":"name","src_table":"user","threshold":"0"},"sparkParameters":{"deployMode":"cluster","driverCores":1,"driverMemory":"512M","executorCores":2,"executorMemory":"2G","numExecutors":2,"others":"--conf spark.yarn.maxAppAttempts=1"}} [INFO] 2022-09-21 09:45:07.706 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[181] - data quality task command: ${SPARK_HOME2}/bin/spark-submit --master yarn --deploy-mode cluster --driver-cores 1 --driver-memory 512M --num-executors 2 --executor-cores 2 --executor-memory 2G --queue default --conf spark.yarn.maxAppAttempts=1 /opt/dolphinscheduler/libs/dolphinscheduler-data-quality-dev-SNAPSHOT.jar "{\"name\":\"$t(null_check)\",\"env\":{\"type\":\"batch\",\"config\":null},\"readers\":[{\"type\":\"JDBC\",\"config\":{\"database\":\"test\",\"password\":\"********\",\"driver\":\"com.mysql.cj.jdbc.Driver\",\"user\":\"root\",\"output_table\":\"test_user\",\"table\":\"user\",\"url\":\"jdbc:mysql://192.168.0.152:3306/test?useUnicode=true&allowPublicKeyRetrival=true&characterEncoding=utf-8&&allowLoadLocalInfile=false&autoDeserialize=false&allowLocalInfile=false&allowUrlInLocalInfile=false\"} }],\"transf ormers\":[{\"type\":\"sql\",\"config\":{\"index\":1,\"output_table\":\"total_count\",\"sql\":\"SELECT COUNT(*) AS total FROM test_user \"} },{\"type\":\"sql\",\"config\":{\"index\":2,\"output_table\":\"null_items\",\"sql\":\"SELECT * FROM test_user WHERE (name is null or name = '') \"} },{\"type\":\"sql\",\"config\":{\"index\":3,\"output_table\":\"null_count\",\"sql\":\"SELECT COUNT(*) AS nulls FROM null_items\"} }],\"writers\":[{\"type\":\"JDBC\",\"config\":{\"database\":\"dolphinscheduler\",\"password\":\"****\",\"driver\":\"org.postgresql.Driver\",\"user\":\"root\",\"table\":\"t_ds_dq_execute_result\",\"url\":\"jdbc:postgresql://dolphinscheduler-postgresql:5432/dolphinscheduler\",\"sql\":\"select 0 as rule_type,'$t(null_check)' as rule_name,0 as process_definition_id,95 as process_instance_id,340 as task_instance_id,null_count.nulls AS statistics_value,total_count.total AS comparison_value,7 AS comparison_type,0 as check_type,0 as threshold,5 as operator,0 as failure_strategy,'hd fs://mycluster:8020/user/root/data_quality_error_data/0_95_check-sql' as error_output_path,'2022-09-21 09:45:07' as create_time,'2022-09-21 09:45:07' as update_time from null_count full join total_count\"} },{\"type\":\"JDBC\",\"config\":{\"database\":\"dolphinscheduler\",\"password\":\"****\",\"driver\":\"org.postgresql.Driver\",\"user\":\"root\",\"table\":\"t_ds_dq_task_statistics_value\",\"url\":\"jdbc:postgresql://dolphinscheduler-postgresql:5432/dolphinscheduler\",\"sql\":\"select 0 as process_definition_id,340 as task_instance_id,1 as rule_id,'ZDHVZ+SCMY2SSYPLT/1PITREQ8XN7ECZXCUIEEI3/BG=' as unique_code,'null_count.nulls'AS statistics_name,null_count.nulls AS statistics_value,'2022-09-21 09:45:07' as data_time,'2022-09-21 09:45:07' as create_time,'2022-09-21 09:45:07' as update_time from null_count\"} },{\"type\":\"hdfs_file\",\"config\":{\"path\":\"hdfs://mycluster:8020/user/root/data_quality_error_data/0_95_check-sql\",\"input_table\":\"null_items\"} }]}" [INFO] 2022-09-21 09:45:07.708 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[85] - tenantCode user:root, task dir:95_340 [INFO] 2022-09-21 09:45:07.709 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[90] - create command file:/tmp/dolphinscheduler/exec/process/6885697322976/6885727510752_21/95/340/95_340.command [INFO] 2022-09-21 09:45:07.709 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[116] - command : #!/bin/sh BASEDIR=$(cd `dirname $0`; pwd) cd $BASEDIR export HADOOP_HOME=/opt/soft/hadoop-3.3.4 export HADOOP_CONF_DIR=/etc/hadoop/conf export SPARK_HOME2=/opt/soft/spark export SPARK_HOME=/opt/soft/spark export PYTHON_HOME=/opt/soft/python export JAVA_HOME=/usr/local/openjdk-8 export HIVE_HOME=/opt/soft/hive export FLINK_HOME=/opt/soft/flink export DATAX_HOME=/opt/soft/datax export YARN_CONF_DIR=/etc/hadoop/conf export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH export HADOOP_CLASSPATH=`hadoop classpath` ${SPARK_HOME2}/bin/spark-submit --master yarn --deploy-mode cluster --driver-cores 1 --driver-memory 512M --num-executors 2 --executor-cores 2 --executor-memory 2G --queue default --conf spark.yarn.maxAppAttempts=1 /opt/dolphinscheduler/libs/dolphinscheduler-data-quality-dev-SNAPSHOT.jar "{\"name\":\"$t(null_check)\",\"env\":{\"type\":\"batch\",\"config\":null},\"readers\":[{\"type\":\"JDBC\",\"config\":{\"database\":\"test\",\"password\":\"********\",\"driver\":\"com.mysql.cj.jdbc.Driver\",\"user\":\"root\",\"output_table\":\"test_user\",\"table\":\"user\",\"url\":\"jdbc:mysql://192.168.0.152:3306/test?useUnicode=true&allowPublicKeyRetrival=true&characterEncoding=utf-8&&allowLoadLocalInfile=false&autoDeserialize=false&allowLocalInfile=false&allowUrlInLocalInfile=false\"} }],\"transformers\":[{\"type\":\"sql\",\"config\":{\"index\":1,\"output_table\":\"total_count\",\"sql\":\"SELECT COUNT(*) AS total FROM test_user \"} },{\"type\":\"sql\",\"config\":{\"index\":2,\"output_table\":\ "null_items\",\"sql\":\"SELECT * FROM test_user WHERE (name is null or name = '') \"} },{\"type\":\"sql\",\"config\":{\"index\":3,\"output_table\":\"null_count\",\"sql\":\"SELECT COUNT(*) AS nulls FROM null_items\"} }],\"writers\":[{\"type\":\"JDBC\",\"config\":{\"database\":\"dolphinscheduler\",\"password\":\"****\",\"driver\":\"org.postgresql.Driver\",\"user\":\"root\",\"table\":\"t_ds_dq_execute_result\",\"url\":\"jdbc:postgresql://dolphinscheduler-postgresql:5432/dolphinscheduler\",\"sql\":\"select 0 as rule_type,'$t(null_check)' as rule_name,0 as process_definition_id,95 as process_instance_id,340 as task_instance_id,null_count.nulls AS statistics_value,total_count.total AS comparison_value,7 AS comparison_type,0 as check_type,0 as threshold,5 as operator,0 as failure_strategy,'hdfs://mycluster:8020/user/root/data_quality_error_data/0_95_check-sql' as error_output_path,'2022-09-21 09:45:07' as create_time,'2022-09-21 09:45:07' as update_time from null_count full join total_coun t\"} },{\"type\":\"JDBC\",\"config\":{\"database\":\"dolphinscheduler\",\"password\":\"****\",\"driver\":\"org.postgresql.Driver\",\"user\":\"root\",\"table\":\"t_ds_dq_task_statistics_value\",\"url\":\"jdbc:postgresql://dolphinscheduler-postgresql:5432/dolphinscheduler\",\"sql\":\"select 0 as process_definition_id,340 as task_instance_id,1 as rule_id,'ZDHVZ+SCMY2SSYPLT/1PITREQ8XN7ECZXCUIEEI3/BG=' as unique_code,'null_count.nulls'AS statistics_name,null_count.nulls AS statistics_value,'2022-09-21 09:45:07' as data_time,'2022-09-21 09:45:07' as create_time,'2022-09-21 09:45:07' as update_time from null_count\"} },{\"type\":\"hdfs_file\",\"config\":{\"path\":\"hdfs://mycluster:8020/user/root/data_quality_error_data/0_95_check-sql\",\"input_table\":\"null_items\"} }]}" [INFO] 2022-09-21 09:45:07.720 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[290] - task run command: sudo -u root sh /tmp/dolphinscheduler/exec/process/6885697322976/6885727510752_21/95/340/95_340.command [INFO] 2022-09-21 09:45:07.727 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[181] - process start, process id is: 9391 [INFO] 2022-09-21 09:45:08.729 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete. [INFO] 2022-09-21 09:45:09.731 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 2022-09-21 17:45:09 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [INFO] 2022-09-21 09:45:10.732 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 2022-09-21 17:45:09 INFO RMProxy:98 - Connecting to ResourceManager at /192.168.0.153:8032 2022-09-21 17:45:10 INFO Client:54 - Requesting a new application from cluster with 1 NodeManagers 2022-09-21 17:45:10 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 2022-09-21 17:45:10 INFO Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead 2022-09-21 17:45:10 INFO Client:54 - Setting up container launch context for our AM 2022-09-21 17:45:10 INFO Client:54 - Setting up the launch environment for our AM container 2022-09-21 17:45:10 INFO Client:54 - Preparing resources for our AM container 2022-09-21 17:45:10 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. [INFO] 2022-09-21 09:45:12.732 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 2022-09-21 17:45:12 INFO Client:54 - Uploading resource file:/tmp/spark-bef9cb3f-fdfb-4198-9d2c-1a1096fbcf40/__spark_libs__2241309768306226262.zip -> file:/root/.sparkStaging/application_1663722602659_0008/__spark_libs__2241309768306226262.zip [INFO] 2022-09-21 09:45:13.733 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 2022-09-21 17:45:13 INFO Client:54 - Uploading resource file:/opt/dolphinscheduler/libs/dolphinscheduler-data-quality-dev-SNAPSHOT.jar -> file:/root/.sparkStaging/application_1663722602659_0008/dolphinscheduler-data-quality-dev-SNAPSHOT.jar 2022-09-21 17:45:13 INFO Client:54 - Uploading resource file:/tmp/spark-bef9cb3f-fdfb-4198-9d2c-1a1096fbcf40/__spark_conf__3800795750991638219.zip -> file:/root/.sparkStaging/application_1663722602659_0008/__spark_conf__.zip 2022-09-21 17:45:13 INFO SecurityManager:54 - Changing view acls to: root 2022-09-21 17:45:13 INFO SecurityManager:54 - Changing modify acls to: root 2022-09-21 17:45:13 INFO SecurityManager:54 - Changing view acls groups to: 2022-09-21 17:45:13 INFO SecurityManager:54 - Changing modify acls groups to: 2022-09-21 17:45:13 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() [INFO] 2022-09-21 09:45:15.735 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 2022-09-21 17:45:15 INFO Client:54 - Submitting application application_1663722602659_0008 to ResourceManager 2022-09-21 17:45:15 INFO YarnClientImpl:273 - Submitted application application_1663722602659_0008 [INFO] 2022-09-21 09:45:16.530 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[375] - find app id: application_1663722602659_0008 [INFO] 2022-09-21 09:45:16.531 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[205] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/6885697322976/6885727510752_21/95/340, processId:9391 ,exitStatusCode:1 ,processWaitForStatus:true ,processExitValue:1 [INFO] 2022-09-21 09:45:16.737 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 2022-09-21 17:45:16 INFO Client:54 - Application report for application_1663722602659_0008 (state: FAILED) 2022-09-21 17:45:16 INFO Client:54 - client token: N/A diagnostics: Application application_1663722602659_0008 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1663722602659_0008_000001 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2022-09-21 17:45:16.207]File file:/root/.sparkStaging/application_1663722602659_0008/__spark_libs__2241309768306226262.zip does not exist java.io.FileNotFoundException: File file:/root/.sparkStaging/application_1663722602659_0008/__spark_libs__2241309768306226262.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:271) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) For more detailed output, check the application tracking page: http://bogon:8088/cluster/app/application_1663722602659_0008 Then click on links to logs of each attempt. . Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1663753515261 final status: FAILED tracking URL: http://bogon:8088/cluster/app/application_1663722602659_0008 user: root 2022-09-21 17:45:16 INFO Client:54 - Deleted staging directory file:/root/.sparkStaging/application_1663722602659_0008 2022-09-21 17:45:16 ERROR Client:70 - Application diagnostics message: Application application_1663722602659_0008 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1663722602659_0008_000001 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2022-09-21 17:45:16.207]File file:/root/.sparkStaging/application_1663722602659_0008/__spark_libs__2241309768306226262.zip does not exist java.io.FileNotFoundException: File file:/root/.sparkStaging/application_1663722602659_0008/__spark_libs__2241309768306226262.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:271) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) For more detailed output, check the application tracking page: http://bogon:8088/cluster/app/application_1663722602659_0008 Then click on links to logs of each attempt. . Failing the application. Exception in thread "main" org.apache.spark.SparkException: Application application_1663722602659_0008 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2022-09-21 17:45:16 INFO ShutdownHookManager:54 - Shutdown hook called 2022-09-21 17:45:16 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bff797a5-84f1-448f-8960-4c5c69920c22 2022-09-21 17:45:16 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bef9cb3f-fdfb-4198-9d2c-1a1096fbcf40 [INFO] 2022-09-21 09:45:16.741 +0000 [taskAppId=TASK-20220921-6885727510752_21-95-340] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[57] - FINALIZE_SESSION ### What you expected to happen success ### How to reproduce data quality task check null value with mysql data source ### Anything else spark 2.4.0 hadoop 3.3.4 ### Version 3.0.0 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
