wgtmac commented on issue #1240: URL: https://github.com/apache/orc/issues/1240#issuecomment-1415187527
> > Hello, it seems there were commits referencing this issue. Is this issue now fixed ? > > @LouisClt Thanks for your follow-up. > > We have implemented a block-based buffer called `BlockBuffer` (by @coderex2522) and used it to replace the output buffer in the `CompressionStream`. It can decrease the memory footprint to some extent. > > IMO, the next step is to use it to replace the input buffer of the `CompressionStream` which has the size of `compressionBlockSize` per stream. To be precise, the `rawInputBuffer` of every CompressionStream is fixed to the compression block size which is 1M by default. Writer with many columns will suffer from large memory footprint and nothing can be done to alleviate it. I have created a JIRA to track it: https://issues.apache.org/jira/browse/ORC-1365 cc @coderex2522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
