etseidl commented on PR #9868:
URL: https://github.com/apache/arrow-rs/pull/9868#issuecomment-4373820729

   I've tracked the regression down to increased overhead in 
`try_reserve_exact`. This shows up particularly in the page index bench due to 
the large number of vector reads that are done in the parsing of the page index 
structures.
   
   I think for now we should limit the scope of this PR to fixing the 
wraparound in `read_list_begin`. 
   
   The question of how to deal with potential OOM errors should be left to a 
larger discussion. While an abort is not ideal, that appears to be part of the 
larger design philosophy of Rust. Given the only way to deal with OOM is to 
exit anyway, why place an extra burden on every allocation?
   
   As to the suggested alternate fix for this issue, namely erroring if the 
size of an allocation exceeds the size of the input buffer, that approach too 
has serious problems. The foremost being that the input stream is highly 
compressed, meaning the resultant vectors can be much larger. For instance, 
`size_of::<SchemaElement>()` returns 96 IIRC. But the schema for the wide 
benchmark (10,000 columns), is on the order of 138kB. `96*10_000` is much 
larger, and could potentially trigger the proposed OOM detection code for a 
valid file.
   
   To summarize:
   - Let's focus only on detecting wraparound in `read_list_begin`
   - Let's move discussion of OOM issues with `with_capacity` to #9874


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to