metsw24-max commented on PR #50089: URL: https://github.com/apache/arrow/pull/50089#issuecomment-4691301003
@pitrou done, opened GH-50162 and renamed the title to match. @AntoinePrv on the spec bounds: the bit width is already checked before it reaches this decoder. Dictionary indices reject anything above 32 in `DictDecoderImpl::SetData`, and for rep/def levels the width isn't read from the file at all, it's derived from `max_level`. Run lengths are bounded by the parser, which truncates or rejects a run that would overflow the buffer. The catch is that the values overflowing here are all within spec: a single bit-packed run can validly hold close to 2^31 values, so `values_read_ * value_bit_width` passes INT32_MAX on legitimate data once a run grows past 256 MiB. So I don't think there's an out-of-spec value to error on at this level; the intermediate just needs the wider type, same as `raw_data_size` above. Happy to add an explicit error path if you'd rather have one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
