gszadovszky commented on PR #1270:
URL: https://github.com/apache/parquet-mr/pull/1270#issuecomment-2024650331

   > @gszadovszky I'm trying to switch the codecs to use 
`ParquetProperties#getPageSizeThreshold()` as the initial buffer size but am 
running into some issues with seeing how to structure that. It looks like the 
various codecs (`SnappyCodec`, `Lz4RawCodec`) are stashed in a `static final` 
map called `CODEC_BY_NAME` in `CodecFactory`. Before they are stashed in the 
map, they are configured by a Hadoop `Configuration` object. Presumably that 
needs to be consistent across the entire classloader, since the configured 
codecs are getting stashed in a `static final` map.
   > 
   > I don't see a way to get the relevant `ParquetProperties` at the time the 
codecs are created. (I'm also not sure if it even really makes sense; is 
`ParquetProperties` something that is consistent across the entire classloader 
like a Hadoop `Configuration` would be?)
   > 
   > Any suggestions are welcome. I could also go back to the approach where 
the initial buffer size isn't configurable, and hard-code it at 4KB or 1MB or 
what seems most reasonable. With the doubling-every-allocation approach 
introduced in this patch, it isn't going to be the end of the world if the 
initial size is too small.
   
   In this case I wouldn't spend to much time on actually passing the 
configured value, and as you said, it might not even possible because of the 
caching.
   I think, you are right to start with a small size and reach the target 
quickly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to