gianm commented on PR #1270: URL: https://github.com/apache/parquet-mr/pull/1270#issuecomment-2023635998
@gszadovszky I'm trying to switch the codecs to use `ParquetProperties#getPageSizeThreshold()` as the initial buffer size but am running into some issues with seeing how to structure that. It looks like the various codecs (`SnappyCodec`, `Lz4RawCodec`) are stashed in a `static final` map called `CODEC_BY_NAME` in `CodecFactory`. Before they are stashed in the map, they are configured by a Hadoop `Configuration` object. Presumably that needs to be consistent across the entire classloader, since the configured codecs are getting stashed in a `static final` map. I don't see a way to get the relevant `ParquetProperties` at the time the codecs are created. (I'm also not sure if it even really makes sense; is `ParquetProperties` something that is consistent across the entire classloader like a Hadoop `Configuration` would be?) Any suggestions are welcome. I could also go back to the approach where the initial buffer size isn't configurable, and hard-code it at 4KB or 1MB or what seems most reasonable. With the doubling-every-allocation approach introduced in this patch, it isn't going to be the end of the world if the initial size is too small. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
