dwave opened a new issue, #8040:
URL: https://github.com/apache/seatunnel/issues/8040

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   使用localfile 本地导入超过50万行的excel,速度越来越慢,提示oom
   
![image](https://github.com/user-attachments/assets/0affb8af-7c74-471a-89b2-16aff014662e)
   
![image](https://github.com/user-attachments/assets/956a4d91-4d15-48d0-84e7-53c4ad8646cc)
   
   
   ### SeaTunnel Version
   
   2.3.8
   
   ### SeaTunnel Config
   
   ```conf
   env {
   "job.mode"=BATCH
   "job.name"="SeaTunnel_Job"
   "savemode.execute.location"=CLUSTER
   }
   source {
     LocalFile{
       result_table_name = "fake1"
       delimiter = "#"
       skip_header_row_number = 1
       path = "/data/sale_detail_v2.xlsx"
       file_format_type = "excel"
       datatime_format = "yyyy-MM-dd HH:mm:ss"
       schema {
           fields {
               order_id = string
               order_label = string
               payment_time = string
               product_name = string
               product_type = string
               product_specification = string
               purchase_quantity = string
               appointment_store = string
               applicable_store_type = string
               product_id = string
               coupon_code = string
               coupon_status = string
               total_times = string
               redeemed_times = string
               remaining_times = string
               redemption_time = string
               user_strike_price = string
               user_single_strike_price = string
               voucher_redeemed_value = string
               order_actual_received = string
               selling_amount = string
               merchant_subsidy = string
               merchant_subsidy_details = string
               product_payment = string
               platform_subsidy = string
               platform_subsidy_details = string
               brand_merchant_subsidy = string
               brand_merchant_subsidy_details = string
               software_service_fee = string
               talent_commission = string
               artisan_incentive_commission = string
               service_provider_commission = string
               insurance_cost = string
               pre_sale_price = string
               appointment_surcharge = string
               software_service_fee_rate = string
               talent_commission_rate = string
               artisan_incentive_commission_rate = string
               service_provider_commission_rate = string
               service_provider_name = string
               sales_role = string
               sales_channel = string
               order_owner_nickname = string
               order_owner_uid = string
               talent_nickname = string
               talent_douyin_number = string
               talent_uid = string
               content_address = string
               artisan_nickname = string
               artisan_douyin_number = string
               artisan_uid = string
               store_staff_nickname = string
               store_staff_douyin_number = string
               store_staff_uid = string
               store_staff_incentive_amount = string
               store_staff_incentive_amount_rate = string
           }
       }
     }
   }
   transform {
   }
   
   sink {
       StarRocks {
           labelPrefix="seatunnel_v4"
           batch_max_rows=102500
           batch_max_bytes=52428800
           enable_upsert_delete=false
           schema_save_mode=IGNORE
           data_save_mode=DROP_DATA
           save_mode_create_template="CREATE TABLE IF NOT EXISTS 
`${database}`.`${table}` (\n${rowtype_primary_key},\n${rowtype_fields}\n) 
ENGINE=OLAP\n PRIMARY KEY (${rowtype_primary_key})\nDISTRIBUTED BY HASH 
(${rowtype_primary_key})PROPERTIES (\n    replication_num = 1 \n)"
           http_socket_timeout_ms=180000
           source_table_name=Table15546062651361
           table="excel_sale_detail_v2"
           database=test
           nodeUrls=[
               ""
           ]
           username=
           password=
           base-url="jdbc:mysql://:9030/test"
       }
       }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/seatunnel.sh  -DJvmOption="-Xms10G -Xmx10G"  --config 
./config/sale_excel.conf -e local
   ```
   
   
   ### Error Exception
   
   ```log
   ask (1/1)] end with state FAILED and Exception: java.lang.OutOfMemoryError: 
Java heap space
           at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
           at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
           at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
           at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
           at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:185)
           at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149)
           at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:136)
           at 
org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
           at 
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
           at 
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
           at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307)
           at 
org.apache.poi.ooxml.util.PackageHelper.open(PackageHelper.java:47)
           at 
org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:309)
           at 
org.apache.seatunnel.connectors.seatunnel.file.source.reader.ExcelReadStrategy.readProcess(ExcelReadStrategy.java:94)
           at 
org.apache.seatunnel.connectors.seatunnel.file.source.reader.AbstractReadStrategy.resolveArchiveCompressedInputStream(AbstractReadStrategy.java:241)
           at 
org.apache.seatunnel.connectors.seatunnel.file.source.reader.ExcelReadStrategy.read(ExcelReadStrategy.java:78)
           at 
org.apache.seatunnel.connectors.seatunnel.file.source.reader.MultipleTableFileSourceReader.pollNext(MultipleTableFileSourceReader.java:81)
           at 
org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159)
           at 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:127)
           at 
org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
           at 
org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:132)
           at 
org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693)
           at 
org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018)
           at 
org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39)
           at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:834)
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to