Re: [I] High Memory Usage and Long GC Times When Writing Parquet Files [parquet-java]

via GitHub Mon, 30 Dec 2024 17:57:47 -0800


wgtmac commented on issue #3102:
URL: https://github.com/apache/parquet-java/issues/3102#issuecomment-2566067269


   In general, dictionary encoding consumes a lot of memory due to buffering 
all entries. So yes, `withDictionaryEncoding(false)` is the right approach to 
reduce the memory footprint in your case. For the resulting file size and read 
performance, it depends on the data distribution or repetition.
   
   > Each file has a fixed 500 rows
   
   I'd say Parquet is not designed for small data in which case the metadata 
overhead is non-trivial. It is more suitable for 100,000+ rows of data to enjoy 
the columnar encoding and compression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] High Memory Usage and Long GC Times When Writing Parquet Files [parquet-java]

Reply via email to