[GitHub] [hudi] bvaradar commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

GitBox Thu, 06 Aug 2020 08:56:12 -0700


bvaradar commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-670015540



   @luffyd : I spent some time trying to understand your use-case. 
   
   To your question : Hudi needs to list partitions in-order to figure out the 
list of valid files that constitute latest snapshot. It looks like your 
use-case is such that you are writing to a lot of partitions and hudi needs to 
list all of them to perform the write. I did check the code and I don't think 
the leak is coming from Hudi. Can you look at the parquet version being used in 
your runtime as @Ares-W  suggested.
   
   On a different note, Regarding the looping, Are you writing the same data to 
hudi again and again ? If not, have you considered looking at Spark Structured 
streaming.  I do see occasional compactions. With latest master, we have added 
async compaction support for structured streaming.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

Reply via email to