wgtmac commented on issue #3102: URL: https://github.com/apache/parquet-java/issues/3102#issuecomment-2566067269
In general, dictionary encoding consumes a lot of memory due to buffering all entries. So yes, `withDictionaryEncoding(false)` is the right approach to reduce the memory footprint in your case. For the resulting file size and read performance, it depends on the data distribution or repetition. > Each file has a fixed 500 rows I'd say Parquet is not designed for small data in which case the metadata overhead is non-trivial. It is more suitable for 100,000+ rows of data to enjoy the columnar encoding and compression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
