gianm commented on PR #1270: URL: https://github.com/apache/parquet-mr/pull/1270#issuecomment-1977765562
> I agree with @wgtmac's concern about the expected size. For compression/decompression we are targeting the page size. The page size is limited by two configs, `parquet.page.size` and `parquet.page.row.count.limit`. (See details [here](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop).) One may configure both to higher values but it does not really make sense to have 64M pages. I did encounter these in the real world, although it's always possible that they were built with some abnormally large values for some reason. > I would not use a hadoop config for the default size of compression buffers. Hadoop typically compresses whole files. Probably the default page size would be a better choice here. I'm ok with doing whichever. FWIW, the setting `io.file.buffer.size` I used in the most recent patch (which was recommended here: https://github.com/apache/parquet-mr/pull/1270#discussion_r1493591742) defaults to 4096 bytes. I am not really a Parquet expert so I am willing to use whatever y'all recommend. Is there another property that would be better? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
