pitrou commented on code in PR #231: URL: https://github.com/apache/parquet-format/pull/231#discussion_r1502427845
########## Encodings.md: ########## @@ -247,6 +253,15 @@ and handled as wrapping around in 2's complement notation so that the original values are correctly restituted. This may require explicit care in some programming languages (for example by doing all arithmetic in the unsigned domain). +One strategy that might be employed to avoid the above mentioned overflow is to +perform the subtraction utilizing integers with a larger number of bits. For example, +while encoding INT32 data one might choose to perform arithmetic operations using +64-bit integers. This can lead to situtations where the number of bits used to encode +the resulting deltas is greater than the number of bits used to represent the input +values. While this behavior is allowed, data produced in this manner may not be Review Comment: I don't think that this behavior is (or should be) allowed. The spec should IMHO prescribe that INT32 is encoded at most using 32-bit deltas, and INT64 using 64-bit deltas. Emitting deltas larger than the physical bitwidth should be considered a bug in the encoder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
