wzsgtc opened a new issue, #9188:
URL: https://github.com/apache/seatunnel/issues/9188

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   目前我在从华为OBS拉去大批量的XML文件到hdfs上。遇到了com.obs.services.exception.ObsException: OBS 
servcie Error Message. Request Error: java.lang.OutOfMemoryError: Java heap 
space 
   有没有办法调优呢。在不使用更大的内存下,批量导入
   ----------------------
   At present, I am pulling a large number of XML files from Huawei OBS to 
hdfs. Encountered com. obs.services.exception.obsexception: obsservcie error 
message. requesterror: java.lang.out of memory error: java heap space.
   Is there any way to tune it? Import in batches without using more memory.
   
   ### SeaTunnel Version
   
   SeaTunnel 2.3.10
   
   
   ### SeaTunnel Config
   
   ```conf
   env {
     parallelism = 4
     job.mode = "BATCH"
     # 内存优化
     execution.buffer.timeout = "60s"
     execution.buffer.size = "10000"
     # 检查点优化
     checkpoint.interval = "1000000"
     checkpoint.timeout = "300000"
     # 检查点优化(新增配置)
     checkpoint.data-persistence = "asynchronous"  # 异步持久化
     state.backend = "filesystem"    # 明确指定状态后端
     state.checkpoints.dir = "hdfs:///checkpoints"
   }
   
   source {
     ObsFile {
       # path = "/xml/publication_record_cn"
       path = "/xml/cn_utility_legal/"
       bucket = "obs://cnipa-byg"
       split_size = "128MB"       # 将小文件合并为128MB的块处理
       merge_partitions = true    # 启用分区合并
       access_key = ""
       access_secret = ""
       endpoint = ""
       file_filter_pattern = "[A-Z0-9]+.XML"
       file_format_type = "binary"
       # 性能优化参数
       hadoop_config {
         fs.obs.threads.max = "32"        # 最大线程数
         fs.obs.threads.core = "16"       # 核心线程数
         fs.obs.multipart.size = "64MB"   # 分段上传大小
       }
     }
   
   }
   
   sink {
     HdfsFile {
       fs.defaultFS = "hdfs://hadoop-master.byg.com"
       path = "/user/jmp/test2"
       format = "parquet"  # 与上面保持一致
       hdfs_site_path = "/etc/hadoop/conf/hdfs-site.xml"
       core_site_path = "/etc/hadoop/conf/core-site.xml"
       filename_time_format = "yyyy.MM.dd"
       batch_size = 5000  # 可调优,防止内存溢出
       file_name_expression = "${transactionId}_${now}"
       # is_enable_transaction = false
       file_format_type = "parquet"
       # Parquet 高级配置
       parquet_config {
         compression = "SNAPPY"         # 启用压缩
         enable_dictionary = true       # 启用字典编码
       }
       # 合并小文件策略
   compaction_strategy {
         type = "count"
         max_count = "1000"
       }
       flow_control {
         bytes_per_second = "100MB"  # 限制写入速率
         qps_limit = 500            # 每秒最大请求数
       }
     }
   }
   ```
   
   ### Running Command
   
   ```shell
   /opt/seatunnel/bin/seatunnel.sh -DJvmOption="-Xmx12g -Xms12g \
   -XX:+UseG1GC -XX:MaxGCPauseMillis=150 \
   -XX:G1HeapRegionSize=4m -XX:InitiatingHeapOccupancyPercent=40 \
   -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError \
   -XX:HeapDumpPath=/tmp/heapdump.hprof" -c ./cn_publication_record.conf -e 
local
   ```
   
   ### Error Exception
   
   ```log
   xception in thread "ForkJoinPool.commonPool-worker-9" 
java.lang.OutOfMemoryError: Java heap space
   2025-04-16 10:52:49,059 WARN  [o.a.h.f.o.OBSFileSystem       ] 
[BlockingWorker-TaskGroupLocation{jobId=964717591270522881, pipelineId=1, 
taskGroupId=4}] - OBSIOException occurred in lazySeek, retry: 0
   org.apache.hadoop.fs.obs.OBSIOException: Reopen at position 0 on 
obs://cnipa-byg/xml/cn_utility_legal/19940810/19940810-1-001/1/CN291994000003201000000000000000LEPRSZH19940810CN00G/CN291994000003201000000000000000LEPRSZH19940810CN00G.XML:
 status [-1] - request id [null] - error code [null] - error message [null] - 
trace :com.obs.services.exception.ObsException: OBS servcie Error Message. 
Request Error: java.lang.OutOfMemoryError: Java heap space : null
        at 
org.apache.hadoop.fs.obs.OBSUtils.translateException(OBSUtils.java:502) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.reopen(OBSInputStream.java:177) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.lazySeek(OBSInputStream.java:303) 
[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.read(OBSInputStream.java:398) 
[connector-file-obs-2.3.10.jar:2.3.10]
        at java.io.DataInputStream.read(DataInputStream.java:100) [?:1.8.0_412]
        at 
org.apache.seatunnel.connectors.seatunnel.file.source.reader.BinaryReadStrategy.read(BinaryReadStrategy.java:74)
 [connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:63)
 [connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:127)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:169)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:132)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:694)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1019)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43) 
[seatunnel-starter.jar:2.3.10]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_412]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_412]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_412]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_412]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_412]
   Caused by: com.obs.services.exception.ObsException: OBS servcie Error 
Message. Request Error: java.lang.OutOfMemoryError: Java heap space
        at 
com.obs.services.internal.utils.ServiceUtils.changeFromServiceException(ServiceUtils.java:749)
 ~[connector-file-obs-2.3.10.jar:2.3.10]
        at com.obs.services.ObsClient.doActionWithResult(ObsClient.java:2707) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at com.obs.services.ObsClient.getObject(ObsClient.java:1607) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.reopen(OBSInputStream.java:171) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        ... 17 more
   Caused by: java.lang.OutOfMemoryError: Java heap space
   2025-04-16 10:52:49,060 INFO  [o.a.s.c.u.RetryUtils          ] 
[event-forwarder-0] - Failed to execute due to java.lang.NullPointerException: 
Target cannot be null!
        at 
com.hazelcast.internal.util.Preconditions.checkNotNull(Preconditions.java:59)
        at 
com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.createInvocationBuilder(OperationServiceImpl.java:300)
        at 
org.apache.seatunnel.engine.server.utils.NodeEngineUtil.sendOperationToMasterNode(NodeEngineUtil.java:37)
        at 
org.apache.seatunnel.engine.server.EventService.lambda$initEventForwardService$1(EventService.java:72)
        at 
org.apache.seatunnel.common.utils.RetryUtils.retryWithException(RetryUtils.java:48)
        at 
org.apache.seatunnel.engine.server.EventService.lambda$initEventForwardService$2(EventService.java:70)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   . Retrying attempt (1/2) after backoff of 0 ms
   2025-04-16 10:52:49,061 INFO  [o.a.s.c.u.RetryUtils          ] 
[event-forwarder-0] - Failed to execute due to java.lang.NullPointerException: 
Target cannot be null!
        at 
com.hazelcast.internal.util.Preconditions.checkNotNull(Preconditions.java:59)
        at 
com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.createInvocationBuilder(OperationServiceImpl.java:300)
        at 
org.apache.seatunnel.engine.server.utils.NodeEngineUtil.sendOperationToMasterNode(NodeEngineUtil.java:37)
        at 
org.apache.seatunnel.engine.server.EventService.lambda$initEventForwardService$1(EventService.java:72)
        at 
org.apache.seatunnel.common.utils.RetryUtils.retryWithException(RetryUtils.java:48)
        at 
org.apache.seatunnel.engine.server.EventService.lambda$initEventForwardService$2(EventService.java:70)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   . Retrying attempt (2/2) after backoff of 0 ms
   2025-04-16 10:52:49,061 WARN  [o.a.s.e.s.EventService        ] 
[event-forwarder-0] - Event forward failed, discard events 1
   java.lang.RuntimeException: Execute given execution failed after retry 2 
times
        at 
org.apache.seatunnel.common.utils.RetryUtils.retryWithException(RetryUtils.java:75)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.EventService.lambda$initEventForwardService$2(EventService.java:70)
 ~[seatunnel-starter.jar:2.3.10]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_412]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_412]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_412]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_412]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_412]
   Caused by: java.lang.NullPointerException: Target cannot be null!
        at 
com.hazelcast.internal.util.Preconditions.checkNotNull(Preconditions.java:59) 
~[seatunnel-starter.jar:2.3.10]
        at 
com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.createInvocationBuilder(OperationServiceImpl.java:300)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.utils.NodeEngineUtil.sendOperationToMasterNode(NodeEngineUtil.java:37)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.EventService.lambda$initEventForwardService$1(EventService.java:72)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.common.utils.RetryUtils.retryWithException(RetryUtils.java:48)
 ~[seatunnel-starter.jar:2.3.10]
        ... 6 more
   2025-04-16 10:52:49,063 WARN  [o.a.s.e.s.TaskExecutionService] 
[BlockingWorker-TaskGroupLocation{jobId=964717591270522881, pipelineId=1, 
taskGroupId=4}] - [localhost]:5801 [seatunnel-153833] [5.1] Exception in 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask@65934e20
   org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: 
ErrorCode:[COMMON-01], ErrorDescription:[SeaTunnel read file 
'obs://cnipa-byg/xml/cn_utility_legal/19940810/19940810-1-001/1/CN291994000003201000000000000000LEPRSZH19940810CN00G/CN291994000003201000000000000000LEPRSZH19940810CN00G.XML'
 failed.]
        at 
org.apache.seatunnel.common.exception.CommonError.fileOperationFailed(CommonError.java:71)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:65)
 ~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:127)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:169)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:132)
 ~[seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:694)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1019)
 [seatunnel-starter.jar:2.3.10]
        at 
org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43) 
[seatunnel-starter.jar:2.3.10]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_412]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_412]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_412]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_412]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_412]
   Caused by: org.apache.hadoop.fs.obs.OBSIOException: Reopen at position 0 on 
obs://cnipa-byg/xml/cn_utility_legal/19940810/19940810-1-001/1/CN291994000003201000000000000000LEPRSZH19940810CN00G/CN291994000003201000000000000000LEPRSZH19940810CN00G.XML:
 status [-1] - request id [null] - error code [null] - error message [null] - 
trace :com.obs.services.exception.ObsException: OBS servcie Error Message. 
Request Error: java.lang.OutOfMemoryError: Java heap space : null
        at 
org.apache.hadoop.fs.obs.OBSUtils.translateException(OBSUtils.java:502) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.reopen(OBSInputStream.java:177) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.lazySeek(OBSInputStream.java:303) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.hadoop.fs.obs.OBSInputStream.read(OBSInputStream.java:398) 
~[connector-file-obs-2.3.10.jar:2.3.10]
        at java.io.DataInputStream.read(DataInputStream.java:100) ~[?:1.8.0_412]
        at 
org.apache.seatunnel.connectors.seatunnel.file.source.reader.BinaryReadStrategy.read(BinaryReadStrategy.java:74)
 ~[connector-file-obs-2.3.10.jar:2.3.10]
        at 
org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:63)
 ~[connector-file-obs-2.3.10.jar:2.3.10]
        ... 12 more
   Caused by: com.obs.services.exception.ObsException: OBS servcie Error 
Message. Request Error: java.lang.OutOfMemoryError: Java heap space
        at 
com.obs.services.internal.utils.ServiceUtils.changeFromServiceException(ServiceUtils.java:749)
 ~[connector-file-obs-2.3.10.jar:2.3.10]
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to