wgtmac commented on code in PR #1868: URL: https://github.com/apache/orc/pull/1868#discussion_r1547135895
########## site/specification/ORCv1.md: ########## @@ -824,14 +824,31 @@ the index values and the additional value bits. bit is set, the entire value is negated. * Data values (W * L bits padded to the byte) - A sequence of W bit positive values that are added to the base value. -* Patch list (PLL * (PGW + PW) bytes) - A list of patches for values +* Patch list (PLL * ceil(PGW + PW) bits) - A list of patches for values that didn't fit within W bits. Each entry in the list consists of a gap, which is the number of elements skipped from the previous patch, and a patch value. Patches are applied by logically or'ing the data values with the relevant patch shifted W bits left. If a patch is 0, it was introduced to skip over more than 255 items. The combined length of each patch (PGW + PW) must be less or equal to - 64. + 64. (PGW + PW) is padded to the nearest fixed bit size according to the + below table before being encoded in the patch list. + +(PGW + PW) | ceil(PGW + PW) +:------------ | :------------- +1 <= x <= 24 | x +25 | 26 +26 | 26 +27 | 28 +28 | 28 +29 | 30 +30 | 30 +31 | 32 +32 | 32 +33 <= x <= 40 | 40 +41 <= x <= 48 | 48 +49 <= x <= 56 | 56 +57 <= x <= 64 | 64 Review Comment: Is it better to follow the java code to add 0 as well? https://github.com/apache/orc/blob/513922af6c1d1a07e632a03231d5b546eeafb26e/java/core/src/java/org/apache/orc/impl/SerializationUtils.java#L362-L391 ########## site/specification/ORCv1.md: ########## @@ -824,14 +824,31 @@ the index values and the additional value bits. bit is set, the entire value is negated. * Data values (W * L bits padded to the byte) - A sequence of W bit positive values that are added to the base value. -* Patch list (PLL * (PGW + PW) bytes) - A list of patches for values +* Patch list (PLL * ceil(PGW + PW) bits) - A list of patches for values that didn't fit within W bits. Each entry in the list consists of a gap, which is the number of elements skipped from the previous patch, and a patch value. Patches are applied by logically or'ing the data values with the relevant patch shifted W bits left. If a patch is 0, it was introduced to skip over more than 255 items. The combined length of each patch (PGW + PW) must be less or equal to - 64. + 64. (PGW + PW) is padded to the nearest fixed bit size according to the + below table before being encoded in the patch list. Review Comment: Could you please rename `ceil` to `closestFixedBits` or `cfb`? `ceil` does not seems to have its common meaning here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
