metsw24-max commented on PR #50089:
URL: https://github.com/apache/arrow/pull/50089#issuecomment-4691301003

   @pitrou done, opened GH-50162 and renamed the title to match.
   
   @AntoinePrv on the spec bounds: the bit width is already checked before it 
reaches this decoder. Dictionary indices reject anything above 32 in 
`DictDecoderImpl::SetData`, and for rep/def levels the width isn't read from 
the file at all, it's derived from `max_level`. Run lengths are bounded by the 
parser, which truncates or rejects a run that would overflow the buffer. The 
catch is that the values overflowing here are all within spec: a single 
bit-packed run can validly hold close to 2^31 values, so `values_read_ * 
value_bit_width` passes INT32_MAX on legitimate data once a run grows past 256 
MiB. So I don't think there's an out-of-spec value to error on at this level; 
the intermediate just needs the wider type, same as `raw_data_size` above. 
Happy to add an explicit error path if you'd rather have one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to