Bingz2 opened a new issue, #2849:
URL: https://github.com/apache/incubator-seatunnel/issues/2849

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   An error was reported when using Spark Doris Sink
   
   ### SeaTunnel Version
   
   2.1.3
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # You can set spark configuration here
     # see available properties defined by spark: 
https://spark.apache.org/docs/latest/configuration.html#available-properties
     spark.app.name = "ST-hive2doris-test"
     spark.executor.instances = "1"
     spark.executor.cores = "5"
     spark.executor.memory = "2g"
     spark.sql.catalogImplementation = "hive"
     spark.dynamicAllocation.enabled = false
   }
   
   source {
     # This is a example input plugin **only for test and demonstrate the 
feature input plugin**
     hive {
       pre_sql = "select * from custom.migu1212_livodpart"
       result_table_name = "my_dataset"
     }
   }
   
   transform {
     # you can also use other filter plugins, such as sql
     # sql {
     #   sql = "select * from accesslog where request_time > 1000"
     # }
   }
   sink {
   
    Doris {
       fenodes="localhost:6033"
       database="test"
       table="migu1212_livodpart"
       user="root"
       password="123456"
       batch_size=20000000
       doris.column_separator="\t"
       #doris.columns=""
       }
   
   }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark.sh -m yarn -e client -c 
config/spark_batch_hive2doris_test.conf
   ```
   
   
   ### Error Exception
   
   ```log
   22/09/22 23:11:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
(TID 0, slave6.test.gitv.we, executor 8): java.lang.NoSuchFieldError: INSTANCE
           at 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
           at 
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   
   22/09/22 23:11:33 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 
0.0 (TID 2, slave7.test.gitv.we, executor 10, partition 0, NODE_LOCAL, 7941 
bytes)
   22/09/22 23:11:33 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in 
memory on slave7.test.gitv.we:20062 (size: 8.4 KB, free: 13.2 GB)
   22/09/22 23:11:33 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 
(TID 1) on slave6.test.gitv.we, executor 7: java.lang.NoSuchFieldError 
(INSTANCE) [duplicate 1]
   22/09/22 23:11:33 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 
0.0 (TID 3, slave7.test.gitv.we, executor 9, partition 1, NODE_LOCAL, 7941 
bytes)
   22/09/22 23:11:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in 
memory on slave7.test.gitv.we:22734 (size: 8.4 KB, free: 13.2 GB)
   22/09/22 23:11:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on slave7.test.gitv.we:20062 (size: 32.8 KB, free: 13.2 GB)
   22/09/22 23:11:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on slave7.test.gitv.we:22734 (size: 32.8 KB, free: 13.2 GB)
   22/09/22 23:11:38 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 
(TID 2) on slave7.test.gitv.we, executor 10: java.lang.NoSuchFieldError 
(INSTANCE) [duplicate 2]
   22/09/22 23:11:38 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 
0.0 (TID 4, slave7.test.gitv.we, executor 10, partition 0, NODE_LOCAL, 7941 
bytes)
   22/09/22 23:11:38 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 
(TID 4, slave7.test.gitv.we, executor 10): java.lang.NoClassDefFoundError: 
Could not initialize class org.apache.http.conn.ssl.SSLConnectionSocketFactory
           at 
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   
   22/09/22 23:11:38 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 
0.0 (TID 5, slave6.test.gitv.we, executor 8, partition 0, NODE_LOCAL, 7941 
bytes)
   22/09/22 23:11:38 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 0.0 
(TID 5) on slave6.test.gitv.we, executor 8: java.lang.NoClassDefFoundError 
(Could not initialize class 
org.apache.http.conn.ssl.SSLConnectionSocketFactory) [duplicate 1]
   22/09/22 23:11:38 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 
4 times; aborting job
   22/09/22 23:11:38 INFO cluster.YarnScheduler: Cancelling stage 0
   22/09/22 23:11:38 INFO cluster.YarnScheduler: Killing all running tasks in 
stage 0: Stage cancelled
   22/09/22 23:11:38 INFO cluster.YarnScheduler: Stage 0 was cancelled
   22/09/22 23:11:38 INFO scheduler.DAGScheduler: ResultStage 0 
(foreachPartition at Doris.scala:59) failed in 11.078 s due to Job aborted due 
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 5, slave6.test.gitv.we, executor 8): 
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.http.conn.ssl.SSLConnectionSocketFactory
           at 
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   
   Driver stacktrace:
   22/09/22 23:11:38 INFO scheduler.DAGScheduler: Job 0 failed: 
foreachPartition at Doris.scala:59, took 11.130926 s
   22/09/22 23:11:38 ERROR base.Seatunnel: 
   
===============================================================================
   
   
   22/09/22 23:11:38 ERROR base.Seatunnel: Fatal Error, 
   
   22/09/22 23:11:38 ERROR base.Seatunnel: Please submit bug report in 
https://github.com/apache/incubator-seatunnel/issues
   
   22/09/22 23:11:38 ERROR base.Seatunnel: Reason:Execute Spark task error 
   
   22/09/22 23:11:38 ERROR base.Seatunnel: Exception 
StackTrace:org.apache.seatunnel.core.base.exception.CommandExecuteException: 
Execute Spark task error
           at 
org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:70)
           at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
           at 
org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 0.0 (TID 5, slave6.test.gitv.we, executor 8): 
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.http.conn.ssl.SSLConnectionSocketFactory
           at 
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   
   Driver stacktrace:
           at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
           at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at scala.Option.foreach(Option.scala:257)
           at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
           at 
org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2735)
           at 
org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735)
           at 
org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735)
           at 
org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3349)
           at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at 
org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3345)
           at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2734)
           at org.apache.seatunnel.spark.doris.sink.Doris.output(Doris.scala:59)
           at org.apache.seatunnel.spark.doris.sink.Doris.output(Doris.scala:30)
           at 
org.apache.seatunnel.spark.SparkEnvironment.sinkProcess(SparkEnvironment.java:179)
           at 
org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:54)
           at 
org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:67)
           ... 14 more
   Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.http.conn.ssl.SSLConnectionSocketFactory
           at 
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil$.createClient(DorisUtil.scala:44)
           at 
org.apache.seatunnel.spark.doris.sink.DorisUtil.saveMessages(DorisUtil.scala:108)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:72)
           at 
org.apache.seatunnel.spark.doris.sink.Doris$$anonfun$output$1.apply(Doris.scala:59)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   ```
   
   
   ### Flink or Spark Version
   
   spark :2.4
   
   ### Java or Scala Version
   
   java 1.8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to