hhr293 commented on code in PR #12299:
URL: https://github.com/apache/gluten/pull/12299#discussion_r3412375272
##########
cpp/core/utils/tac/ffor.hpp:
##########
@@ -496,5 +521,178 @@ inline size_t decompress64(const uint8_t* input, size_t
inputSize, uint64_t* out
return decompress64Impl<false, false>(input, inputSize, output);
}
+//
=============================================================================
+// 128-bit codec.
+//
+// Each 128-bit value occupies a 16B slot (lo at offset 0, hi at offset 8 --
+// the DECIMAL128 / __int128_t layout used by Velox). Per block, the lo and
+// hi halves are gathered into two stack scratches and each is fed through
+// the 64-bit FFOR encoder. Reads/writes go through native uint64, so the
+// codec is byte-order agnostic as long as producer and consumer agree.
+//
+// Wire format per block: [hdr][lo payload][hdr][hi payload]
+// followed by one tail block (kBwTailMarker) carrying the remaining 16B
+// values raw.
+//
=============================================================================
+
+inline constexpr size_t compress128Bound(size_t numValues) {
+ // Two 64-bit streams (lo + hi), worst case each =
compress64Bound(numValues).
+ return 2 * compress64Bound(numValues);
+}
Review Comment:
compress128Bound is an allocation hint called once per partition, not a hot
path. The "over-estimation" for small inputs is at most a few hundred
bytes (two
extra headers worth). In shuffle context, partitions typically contain
thousands to millions of values — the bound is tight for the common case.
A tighter bound would add complexity (special-casing tail-only vs.
multi-block)
for negligible memory savings. The current formula is simple, correct, and
never
under-estimates. Over-allocating a few bytes for edge cases is preferable
to a
more complex bound that risks under-estimation bugs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]