alamb commented on issue #7243: URL: https://github.com/apache/arrow-rs/issues/7243#issuecomment-2708228389
> To merge the third dictionary, the current state is flattened. This has the effect of allocating 600*4MB chunks in the buffer for the nulls, which overflows the i32 size. This seems very non ideal -- and maybe for this particular corner case we could make the allocations less crazy. But I see that this is just one edge case > I think maybe there's also a possible mitigation in the `parquet` crate, to add a reader option to not merge row groups? That might be a nice efficiency win for decoding. I think this option sounds very reasonable to me and gives better control to the user. cc @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org