gianm commented on PR #1270:
URL: https://github.com/apache/parquet-mr/pull/1270#issuecomment-2023635998

   @gszadovszky I'm trying to switch the codecs to use 
`ParquetProperties#getPageSizeThreshold()` as the initial buffer size but am 
running into some issues with seeing how to structure that. It looks like the 
various codecs (`SnappyCodec`, `Lz4RawCodec`) are stashed in a `static final` 
map called `CODEC_BY_NAME` in `CodecFactory`. Before they are stashed in the 
map, they are configured by a Hadoop `Configuration` object. Presumably that 
needs to be consistent across the entire classloader, since the configured 
codecs are getting stashed in a `static final` map.
   
   I don't see a way to get the relevant `ParquetProperties` at the time the 
codecs are created. (I'm also not sure if it even really makes sense; is 
`ParquetProperties` something that is consistent across the entire classloader 
like a Hadoop `Configuration` would be?)
   
   Any suggestions are welcome. I could also go back to the approach where the 
initial buffer size isn't configurable, and hard-code it at 4KB or 1MB or what 
seems most reasonable. With the doubling-every-allocation approach introduced 
in this patch, it isn't going to be the end of the world if the initial size is 
too small.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to