kyehe opened a new issue, #4367: URL: https://github.com/apache/incubator-seatunnel/issues/4367
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened source: mysql (splitted by primary key 'id' into 10) sink: hive spark: 2 task runs in parallel NullPointerException and IndexOutOfBoundsException occured when receive some row to write. the detail of stack was below. I picked thess exception data in exception log stack to sync with a new job, it can be sinked into hive without any exception. ### SeaTunnel Version 2.3.0 ### SeaTunnel Config ```conf env { spark.app.name = " SeaTunnel Spark Job" spark.dynamicAllocation.enabled = false spark.executor.instances = 2 spark.executor.cores = 1 spark.executor.memory = "10g" spark.driver.memory = "1g" spark.dynamicAllocation.minExecutors = 1 spark.executor.memoryOverhead = 1g spark.executor.heartbeatInterval = 120s parallelism = 10 } source { jdbc { driver = "com.mysql.cj.jdbc.Driver" table = ["xxx"] url = "jdbc:mysql://xxxx?useUnicode=true&characterEncoding=utf8&useSSL=false&useFastDateParsing=false&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true&autoReconnect=true" user = "xxx" password = "xxx" query = "select xxx from xxx" result_table_name = "source_table" partition_column = "id" partition_num = 10 } } transform { } sink { # choose stdout output plugin to output data to console Hive { source_table_name = "source_table" metastore_uri = "xxx" table_name = "xxx.xxx" partition_dir_expression = "dt=2023-03-16" tmp_path = "hdfs://xxx/tmp/seatunnel" defaultFS = "hdfs://xxx" password = "xxx" username = "xxx" } } ``` ### Running Command ```shell spark-submit \ --class org.apache.seatunnel.core.starter.spark.SeatunnelSpark \ --jars connector-hive-2.1.3-SNAPSHOT.jar,connector-jdbc-2.1.3-SNAPSHOT.jar \ --files "spark.batch.mysql2hive_file.conf" \ seatunnel-spark-starter.jar \ -m "master" \ -e "cluster" \ --config spark.batch.mysql2hive_file.conf ``` ### Error Exception ```log NullPointerException Stack: 23/03/17 14:45:13 INFO source.JdbcSourceSplitEnumerator: Assigning splits to readers [JdbcSourceSplit(parameterValues=[29678096, 33917822], splitId=7)] 23/03/17 14:45:15 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://xxx/hive/tmp/seatunnel/166c8e02987647b2b355717fee1c7ddc/T_166c8e02987647b2b355717fee1c7ddc_7_1/dt=2023-03-16/T_166c8e02987647b2b355717fee1c7ddc_7_1.orc_0.orc with stripeSize: 67108864 blockSize: 268435456 compression: SNAPPY bufferSize: 262144 23/03/17 14:45:15 INFO impl.WriterImpl: ORC writer created for path: hdfs://xxx/hive/tmp/seatunnel/166c8e02987647b2b355717fee1c7ddc/T_166c8e02987647b2b355717fee1c7ddc_7_1/dt=2023-03-16/T_166c8e02987647b2b355717fee1c7ddc_7_1.orc_0.orc with stripeSize: 67108864 blockSize: 268435456 compression: SNAPPY bufferSize: 262144 23/03/17 14:50:23 ERROR util.Utils: Aborting task java.lang.RuntimeException: Write data error, please check at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:65) at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:32) at org.apache.seatunnel.translation.spark.sink.SparkDataWriter.write(SparkDataWriter.java:58) at org.apache.seatunnel.translation.spark.sink.SparkDataWriter.write(SparkDataWriter.java:37) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1363) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Write data [[ "xxx", "xxx", xxx ]] to orc file [hdfs://xxx/hive/tmp/seatunnel/166c8e02987647b2b355717fee1c7ddc/T_166c8e02987647b2b355717fee1c7ddc_7_1/dt=2023-03-16/T_166c8e02987647b2b355717fee1c7ddc_7_1.orc_0.orc] error at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:83) at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:63) ... 17 more Caused by: java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.DynamicByteArray.add(DynamicByteArray.java:115) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.addNewKey(StringRedBlackTree.java:48) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:60) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557) at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:78) ... 18 more 23/03/17 14:50:23 ERROR v2.DataWritingSparkTask: Aborting commit for partition 7 (task 7, attempt 0stage 0.0) IndexOutOfBoundsException Stack: 23/03/17 08:26:12 INFO source.JdbcSourceSplitEnumerator: Starting to calculate splits. 23/03/17 08:26:12 INFO source.JdbcSourceSplitEnumerator: Calculated splits successfully, the size of splits is 60. 23/03/17 08:26:12 INFO source.JdbcSourceSplitEnumerator: Assigning splits to readers [JdbcSourceSplit(parameterValues=[31790864, 32497326], splitId=45)] 23/03/17 08:26:14 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://xxx/hive/tmp/seatunnel/d80f34efa2b447cfa43c57e810d2ca78/T_d80f34efa2b447cfa43c57e810d2ca78_46_1/dt=2023-03-16/T_d80f34efa2b447cfa43c57e810d2ca78_46_1.orc_0.orc with stripeSize: 67108864 blockSize: 268435456 compression: SNAPPY bufferSize: 262144 23/03/17 08:26:14 INFO impl.OrcCodecPool: Got brand-new codec SNAPPY 23/03/17 08:26:14 INFO impl.WriterImpl: ORC writer created for path: hdfs://xxx/hive/tmp/seatunnel/d80f34efa2b447cfa43c57e810d2ca78/T_d80f34efa2b447cfa43c57e810d2ca78_46_1/dt=2023-03-16/T_d80f34efa2b447cfa43c57e810d2ca78_46_1.orc_0.orc with stripeSize: 67108864 blockSize: 268435456 compression: SNAPPY bufferSize: 262144 23/03/17 08:26:35 ERROR util.Utils: Aborting task java.lang.RuntimeException: Write data error, please check at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:65) at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:32) at org.apache.seatunnel.translation.spark.sink.SparkDataWriter.write(SparkDataWriter.java:58) at org.apache.seatunnel.translation.spark.sink.SparkDataWriter.write(SparkDataWriter.java:37) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1363) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Write data [[ "xxx", "xxx", xxx ]] to orc file [hdfs://xxx/hive/tmp/seatunnel/d80f34efa2b447cfa43c57e810d2ca78/T_d80f34efa2b447cfa43c57e810d2ca78_46_1/dt=2023-03-16/T_d80f34efa2b447cfa43c57e810d2ca78_46_1.orc_0.orc] error at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:83) at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:63) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index 3473 is outside of 0..3472 at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.DynamicIntArray.get(DynamicIntArray.java:73) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree$VisitorContextImpl.setPosition(StringRedBlackTree.java:143) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:156) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:158) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.recurse(StringRedBlackTree.java:155) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.StringRedBlackTree.visit(StringRedBlackTree.java:168) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.writer.StringBaseTreeWriter.flushDictionary(StringBaseTreeWriter.java:139) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.writer.StringBaseTreeWriter.flushStreams(StringBaseTreeWriter.java:284) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.writer.StructTreeWriter.flushStreams(StructTreeWriter.java:161) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:460) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.WriterImpl.checkMemory(WriterImpl.java:266) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.MemoryManagerImpl.notifyWriters(MemoryManagerImpl.java:190) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.MemoryManagerImpl.addedRow(MemoryManagerImpl.java:178) at org.apache.seatunnel.shade.connector.hive.org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:569) at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:78) ... 18 more 23/03/17 08:26:35 ERROR v2.DataWritingSparkTask: Aborting commit for partition 45 (task 46, attempt 0stage 0.0) ``` ### Flink or Spark Version spark 2.4 ### Java or Scala Version java: 1.8 scala: 2.11.12 ### Screenshots _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
