lyshyhuangli opened a new issue, #6920: URL: https://github.com/apache/seatunnel/issues/6920
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened 在海豚调度(3.1.2版本)中使用seatunnel采集MySQL数据到ClickHouse: 1.MySQL中的一个表的数据量为1.4亿条,一次全量采集到ClickHouse失败; 2.在同等配置的情况下,如果分批次采集,一批次大约1.1千万条 是可以采集成功的; 我想问的是: 1、在全量一次性采集的情况下,如果是说MySQL中的这边表的数据库太大的话,在spark中分配的内存不足以存放下这么多数据的话,能否像kettle一样,一部分一部分把数据采集到ClickHouse中, 不是直接中断ClickHouse的写操作了; 2、如果seatunnel能像kettle一样,部分部分采集的话,那是否是我哪里的配置有问题? ### SeaTunnel Version 2.3.0 ### SeaTunnel Config ```conf "env" : { "spark.app.name" : "ods_t3070220000029_000203_v5_all", "spark.executor.instances" : 2, "spark.executor.cores" : 6, "spark.executor.memory" : "10g", "spark.network.timeout" : 10000000, "spark.executor.heartbeatInterval" : 1000000, "spark.yarn.executor.memoryOverhead" : 1024, "spark.yarn.driver.memoryOverhead" : 1024, "spark.yarn.max.executor.failures" : 4, "spark.task.cpus" : 4 }, ``` ### Running Command ```shell ${SPARK_HOME}/bin/spark-submit --class "org.apache.seatunnel.core.starter.spark.SeatunnelSpark" --name "SeaTunnel" --master "yarn" --deploy-mode "client" --jars "/opt/server/seatunnel/plugins/jdbc/lib/ali-phoenix-shaded-thin-client-5.2.5-HBase-2.x.jar,/opt/server/seatunnel/plugins/jdbc/lib/mysql-connector-java-8.0.27.jar,/opt/server/seatunnel/plugins/jdbc/lib/postgresql-42.4.3.jar,/opt/server/seatunnel/plugins/jdbc/lib/DmJdbcDriver18-8.1.2.141.jar,/opt/server/seatunnel/plugins/jdbc/lib/mssql-jdbc-9.2.1.jre8.jar,/opt/server/seatunnel/plugins/jdbc/lib/ojdbc8-12.2.0.1.jar,/opt/server/seatunnel/plugins/jdbc/lib/sqlite-jdbc-3.39.3.0.jar,/opt/server/seatunnel/plugins/jdbc/lib/db2jcc-db2jcc4.jar,/opt/server/seatunnel/plugins/jdbc/lib/tablestore-jdbc-5.13.9.jar,/opt/server/seatunnel/plugins/jdbc/lib/terajdbc4-17.20.00.12.jar,/opt/server/seatunnel/plugins/jdbc/lib/redshift-jdbc42-2.1.0.9.jar,/opt/server/seatunnel/lib/seatunnel-transforms-v2.jar,/opt/server/seatunnel/lib/hadoop-aws-3.1.4.ja r,/opt/server/seatunnel/lib/aws-java-sdk-bundle-1.11.271.jar,/opt/server/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber-2.3.0-2.11.12.jar,/opt/server/seatunnel/connectors/seatunnel/connector-clickhouse-2.3.0.jar,/opt/server/seatunnel/connectors/seatunnel/connector-jdbc-2.3.0.jar" --conf "spark.executor.memory=10g" --conf "spark.task.cpus=4" --conf "spark.yarn.driver.memoryOverhead=1024" --conf "spark.executor.heartbeatInterval=1000000" --conf "spark.yarn.max.executor.failures=4" --conf "spark.network.timeout=10000000" --conf "spark.executor.cores=6" --conf "spark.app.name=ods_t3070220000029_000203_v5_all" --conf "spark.yarn.executor.memoryOverhead=1024" --conf "spark.executor.instances=2" /opt/server/seatunnel/starter/seatunnel-spark-starter.jar --config "/tmp/dolphinscheduler/exec/process/dps/13065104481120/13124201617382_13/1100/1782/seatunnel_1100_1782.conf" --master "yarn" --deploy-mode "client" ``` ### Error Exception ```log 24/04/02 13:34:19 ERROR v2.WriteToDataSourceV2Exec: Data source writer org.apache.seatunnel.translation.spark.sink.SparkDataSourceWriter@71d0b8a4 is aborting. 24/04/02 13:34:19 ERROR v2.WriteToDataSourceV2Exec: Data source writer org.apache.seatunnel.translation.spark.sink.SparkDataSourceWriter@71d0b8a4 aborted. 24/04/02 13:34:19 ERROR command.SparkApiTaskExecuteCommand: Run SeaTunnel on spark failed. org.apache.spark.SparkException: Writing job aborted. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:259) at org.apache.seatunnel.core.starter.spark.execution.SinkExecuteProcessor.execute(SinkExecuteProcessor.java:85) at org.apache.seatunnel.core.starter.spark.execution.SparkExecution.execute(SparkExecution.java:61) at org.apache.seatunnel.core.starter.spark.command.SparkApiTaskExecuteCommand.execute(SparkApiTaskExecuteCommand.java:55) at org.apache.seatunnel.core.starter.Seatunnel.run(Seatunnel.java:39) at org.apache.seatunnel.core.starter.spark.SeatunnelSpark.main(SeatunnelSpark.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Aborting TaskSet 0.0 because task 0 (partition 0) ``` ### Zeta or Flink or Spark Version Spark 2.3 ### Java or Scala Version java 1.8 ### Screenshots _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
