Bingz2 opened a new issue, #2849: URL: https://github.com/apache/incubator-seatunnel/issues/2849
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened An error was reported when using Spark Doris Sink ### SeaTunnel Version 2.1.3 ### SeaTunnel Config ```conf env { # You can set spark configuration here # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties spark.app.name = "ST-hive2doris-test" spark.executor.instances = "1" spark.executor.cores = "5" spark.executor.memory = "2g" spark.sql.catalogImplementation = "hive" spark.dynamicAllocation.enabled = false } source { # This is a example input plugin **only for test and demonstrate the feature input plugin** hive { pre_sql = "select * from custom.migu1212_livodpart" result_table_name = "my_dataset" } } transform { # you can also use other filter plugins, such as sql # sql { # sql = "select * from accesslog where request_time > 1000" # } } sink { Doris { fenodes="localhost:6033" database="test" table="migu1212_livodpart" user="root" password="123456" batch_size=20000000 doris.column_separator="\t" #doris.columns="" } } ``` ### Running Command ```shell ./bin/start-seatunnel-spark.sh -m yarn -e client -c config/spark_batch_hive2doris_test.conf ``` ### Error Exception ```log 22/09/22 23:11:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, slave6.test.gitv.we, executor 8): java.lang.NoSuchFieldError: INSTANCE at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144) at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955) at org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44) at org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 22/09/22 23:11:33 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, slave7.test.gitv.we, executor 10, partition 0, NODE_LOCAL, 7941 bytes) 22/09/22 23:11:33 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave7.test.gitv.we:20062 (size: 8.4 KB, free: 13.2 GB) 22/09/22 23:11:33 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on slave6.test.gitv.we, executor 7: java.lang.NoSuchFieldError (INSTANCE) [duplicate 1] 22/09/22 23:11:33 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, slave7.test.gitv.we, executor 9, partition 1, NODE_LOCAL, 7941 bytes) 22/09/22 23:11:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave7.test.gitv.we:22734 (size: 8.4 KB, free: 13.2 GB) 22/09/22 23:11:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave7.test.gitv.we:20062 (size: 32.8 KB, free: 13.2 GB) 22/09/22 23:11:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave7.test.gitv.we:22734 (size: 32.8 KB, free: 13.2 GB) 22/09/22 23:11:38 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on slave7.test.gitv.we, executor 10: java.lang.NoSuchFieldError (INSTANCE) [duplicate 2] 22/09/22 23:11:38 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, slave7.test.gitv.we, executor 10, partition 0, NODE_LOCAL, 7941 bytes) 22/09/22 23:11:38 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4, slave7.test.gitv.we, executor 10): java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955) at org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44) at org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 22/09/22 23:11:38 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 5, slave6.test.gitv.we, executor 8, partition 0, NODE_LOCAL, 7941 bytes) 22/09/22 23:11:38 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 0.0 (TID 5) on slave6.test.gitv.we, executor 8: java.lang.NoClassDefFoundError (Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory) [duplicate 1] 22/09/22 23:11:38 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job 22/09/22 23:11:38 INFO cluster.YarnScheduler: Cancelling stage 0 22/09/22 23:11:38 INFO cluster.YarnScheduler: Killing all running tasks in stage 0: Stage cancelled 22/09/22 23:11:38 INFO cluster.YarnScheduler: Stage 0 was cancelled 22/09/22 23:11:38 INFO scheduler.DAGScheduler: ResultStage 0 (foreachPartition at Doris.scala:59) failed in 11.078 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 5, slave6.test.gitv.we, executor 8): java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955) at org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44) at org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: 22/09/22 23:11:38 INFO scheduler.DAGScheduler: Job 0 failed: foreachPartition at Doris.scala:59, took 11.130926 s 22/09/22 23:11:38 ERROR base.Seatunnel: =============================================================================== 22/09/22 23:11:38 ERROR base.Seatunnel: Fatal Error, 22/09/22 23:11:38 ERROR base.Seatunnel: Please submit bug report in https://github.com/apache/incubator-seatunnel/issues 22/09/22 23:11:38 ERROR base.Seatunnel: Reason:Execute Spark task error 22/09/22 23:11:38 ERROR base.Seatunnel: Exception StackTrace:org.apache.seatunnel.core.base.exception.CommandExecuteException: Execute Spark task error at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:70) at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40) at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 5, slave6.test.gitv.we, executor 8): java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955) at org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44) at org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2735) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735) at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3349) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3345) at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2734) at org.apache.seatunnel.spark.doris.sink.Doris.output(Doris.scala:59) at org.apache.seatunnel.spark.doris.sink.Doris.output(Doris.scala:30) at org.apache.seatunnel.spark.SparkEnvironment.sinkProcess(SparkEnvironment.java:179) at org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:54) at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:67) ... 14 more Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955) at org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44) at org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72) at org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` ### Flink or Spark Version spark :2.4 ### Java or Scala Version java 1.8 ### Screenshots _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
