[GitHub] [iceberg] BsoBird opened a new issue, #7514: About ICEBERG's experience in using large data tables

via GitHub Wed, 03 May 2023 06:37:28 -0700


BsoBird opened a new issue, #7514:
URL: https://github.com/apache/iceberg/issues/7514


   ### Query engine
   
   spark 3.3.2
   iceberg 1.2.1
   
   ### Question
   
   I have a data table with nearly 40 billion data and nearly 10TB data size.
   I want to MERGE the incremental data into the base data table every day, the 
daily incremental data is close to 3000W.
   I have tested three options: no partitioned table, year-month partitioned 
table, and tenant partitioned table. The efficiency of data MERGE is poor.
   Currently, the time required to perform a MERGE is close to 3-5 hours.And 
the memory is often low, causing tasks to fail.
   Executor's memory is set to 16G and uses dynamic resources.
   How can I quickly MERGE incremental data into the base data table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] BsoBird opened a new issue, #7514: About ICEBERG's experience in using large data tables

Reply via email to