[GitHub] [hudi] kazdy commented on issue #8261: [SUPPORT] How to reduce hoodie commit latency

via GitHub Thu, 23 Mar 2023 02:58:03 -0700


kazdy commented on issue #8261:
URL: https://github.com/apache/hudi/issues/8261#issuecomment-1480900746


   The issue here is that hudi reads all files under .hoodie/archived directory 
and the number of files to read grows with every archived commit.
   
   The workaround is to clean .hoodie/archived directory frequently (or move 
files to another dir).
   Some users enabled s3 lifecycle rule to expire objects under this prefix.
   I have not tried it myself as I don't want to remove anything manually.
   
   You can also run hive sync in a separate job once a day so that new 
partitions are added. Then, it will not affect your data writing. Bit after 
some time this also will become slow and use more memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] kazdy commented on issue #8261: [SUPPORT] How to reduce hoodie commit latency

Reply via email to