[GitHub] [orc] wgtmac commented on issue #1240: Huge memory taken for each field when exporting

via GitHub Thu, 02 Feb 2023 23:09:51 -0800


wgtmac commented on issue #1240:
URL: https://github.com/apache/orc/issues/1240#issuecomment-1415187527


   > > Hello, it seems there were commits referencing this issue. Is this issue 
now fixed ?
   > 
   > @LouisClt Thanks for your follow-up.
   > 
   > We have implemented a block-based buffer called `BlockBuffer` (by 
@coderex2522) and used it to replace the output buffer in the 
`CompressionStream`. It can decrease the memory footprint to some extent.
   > 
   > IMO, the next step is to use it to replace the input buffer of the 
`CompressionStream` which has the size of `compressionBlockSize` per stream.
   
   To be precise, the `rawInputBuffer` of every CompressionStream is fixed to 
the compression block size which is 1M by default. Writer with many columns 
will suffer from large memory footprint and nothing can be done to alleviate it.
   
   I have created a JIRA to track it: 
https://issues.apache.org/jira/browse/ORC-1365
   
   cc @coderex2522 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] wgtmac commented on issue #1240: Huge memory taken for each field when exporting

Reply via email to