Re: [I] Parquet error: index overflow decoding byte array [arrow-rs]

via GitHub Sat, 08 Mar 2025 04:26:28 -0800


alamb commented on issue #7243:
URL: https://github.com/apache/arrow-rs/issues/7243#issuecomment-2708228389


   > To merge the third dictionary, the current state is flattened. This has 
the effect of allocating 600*4MB chunks in the buffer for the nulls, which 
overflows the i32 size.
   
   This seems very non ideal -- and maybe for this particular corner case we 
could make the allocations less crazy. But I see that this is just one edge case
   
   > I think maybe there's also a possible mitigation in the `parquet` crate, 
to add a reader option to not merge row groups? That might be a nice efficiency 
win for decoding.
   
   I think this option sounds very reasonable to me and gives better control to 
the user. 
   
   cc @tustvold 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] Parquet error: index overflow decoding byte array [arrow-rs]

Reply via email to