[I] [Bug] [dataquality] Data quality - null value detection - execution error [dolphinscheduler]

via GitHub Sun, 11 Aug 2024 19:04:19 -0700


wuchunfu opened a new issue, #16435:
URL: https://github.com/apache/dolphinscheduler/issues/16435


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   When I use PostgreSQL as the initialization database for the dolphin 
scheduler and run the data quality control detection task, the task reports an 
error indicating that the dolphin scheduler schema does not exist
   
   
   ### What you expected to happen
   
   ```bash
   [INFO] 2024-08-09 14:54:29.629 +0800 -  -> 
        24/08/09 14:54:29 INFO Client: Application report for 
application_1722308400032_0053 (state: RUNNING)
   [INFO] 2024-08-09 14:54:30.630 +0800 -  -> 
        24/08/09 14:54:30 INFO Client: Application report for 
application_1722308400032_0053 (state: FINISHED)
        24/08/09 14:54:30 INFO Client: 
                 client token: N/A
                 diagnostics: User class threw exception: 
org.postgresql.util.PSQLException: ERROR: schema "dolphinschedulers" does not 
exist
          Position: 14
                at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
                at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
                at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
                at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
                at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
                at 
org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
                at 
org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
                at 
org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
                at 
org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:265)
                at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:844)
                at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95)
                at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
                at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
                at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
                at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
                at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
                at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
                at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
                at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
                at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
                at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
                at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
                at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
                at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
                at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
                at 
org.apache.dolphinscheduler.data.quality.flow.batch.writer.JdbcWriter.write(JdbcWriter.java:87)
                at 
org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.executeWriter(SparkBatchExecution.java:132)
                at 
org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.execute(SparkBatchExecution.java:58)
                at 
org.apache.dolphinscheduler.data.quality.context.DataQualityContext.execute(DataQualityContext.java:62)
                at 
org.apache.dolphinscheduler.data.quality.DataQualityApplication.main(DataQualityApplication.java:78)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:498)
                at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
        
                 ApplicationMaster host: 10.10.4.230
                 ApplicationMaster RPC port: 0
                 queue: default
                 start time: 1723186518789
                 final status: FAILED
                 tracking URL: 
http://node02:8088/proxy/application_1722308400032_0053/
                 user: default
        Exception in thread "main" org.apache.spark.SparkException: Application 
application_1722308400032_0053 finished with failed status
                at org.apache.spark.deploy.yarn.Client.run(Client.scala:1269)
                at 
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1627)
                at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
                at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
                at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
                at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        24/08/09 14:54:30 INFO ShutdownHookManager: Shutdown hook called
        24/08/09 14:54:30 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-71c505a9-2358-4455-b8dd-3838611055c9
        24/08/09 14:54:30 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-33420b51-2b30-4a48-9b4f-502a4b7976a0
   [INFO] 2024-08-09 14:54:30.632 +0800 - process has exited. execute 
path:/tmp/dolphinscheduler/exec/process/default/14382384652384/14559278303968_3/39/46,
 processId:1301503 ,exitStatusCode:1 ,processWaitForStatus:true 
,processExitValue:1
   [INFO] 2024-08-09 14:54:30.633 +0800 - Start finding appId in 
/opt/dolphinscheduler/worker-server/logs/20240809/14559278303968/3/39/46.log, 
fetch way: log 
   [INFO] 2024-08-09 14:54:30.639 +0800 - Find appId: 
application_1722308400032_0053 from 
/opt/dolphinscheduler/worker-server/logs/20240809/14559278303968/3/39/46.log
   [INFO] 2024-08-09 14:54:30.640 +0800 - 
***********************************************************************************************
   [INFO] 2024-08-09 14:54:30.640 +0800 - *********************************  
Finalize task instance  ************************************
   [INFO] 2024-08-09 14:54:30.640 +0800 - 
***********************************************************************************************
   [INFO] 2024-08-09 14:54:30.641 +0800 - Upload output files: [] successfully
   [INFO] 2024-08-09 14:54:30.657 +0800 - Send task execute status: FAILURE to 
master : 10.10.4.251:1234
   [INFO] 2024-08-09 14:54:30.658 +0800 - Remove the current task execute 
context from worker cache
   [INFO] 2024-08-09 14:54:30.658 +0800 - The current execute mode isn't 
develop mode, will clear the task execute file: 
/tmp/dolphinscheduler/exec/process/default/14382384652384/14559278303968_3/39/46
   [INFO] 2024-08-09 14:54:30.697 +0800 - Success clear the task execute file: 
/tmp/dolphinscheduler/exec/process/default/14382384652384/14559278303968_3/39/46
   [INFO] 2024-08-09 14:54:30.699 +0800 - FINALIZE_SESSION
   ```
   
   ### How to reproduce
   
   Use PostgreSQL as the initialization database for the Dolphin scheduler, and 
run the data quality control detection task to reproduce it
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.2.x
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] [dataquality] Data quality - null value detection - execution error [dolphinscheduler]

Reply via email to