laskoviymishka opened a new issue, #2486:
URL: https://github.com/apache/iceberg-rust/issues/2486

   Follow-up from apache/iceberg-go#1110.
   
   iceberg-rust uses `Bucket(u32)` / `Truncate(u32)`, accepting `N` up to 
`MaxUint32` (4,294,967,295). Java's reference impl uses `int`, capping at 
`Integer.MAX_VALUE` (2,147,483,647). iceberg-go just landed 
apache/iceberg-go#1110 which validates `n <= math.MaxInt32` at parse time, 
aligning with Java. iceberg-cpp uses `int32_t`, so the upper bound matches Java 
via the type. PyIceberg uses `PositiveInt` with no explicit ceiling.
   
   So among the five clients:
   
   | client | type | upper bound on `N` |
   | --- | --- | --- |
   | Java | `int` | `MaxInt32` (2,147,483,647) |
   | iceberg-go (post-#1110) | `int` + parse check | `MaxInt32` |
   | iceberg-cpp | `int32_t` | `INT32_MAX` |
   | iceberg-rust | `u32` | `MaxUint32` (4,294,967,295) |
   | PyIceberg | `PositiveInt` | unbounded |
   
   The spec doesn't bound `N` explicitly — `Integer.MAX_VALUE` only appears as 
a mask operand in the bucket formula `(murmur3_x86_32_hash(x) & 
Integer.MAX_VALUE) % N`, not as a ceiling. So `N > Integer.MAX_VALUE` is 
undefined per-spec rather than disallowed, but Java is the de facto reference 
and three of the four other clients match.
   
   Worth noting: iceberg-cpp's `TransformFromString` parses via 
`ParseNumber<int32_t>` (rejects overflow at parse time via the type), but the 
`Bucket` / `Truncate` constructors don't validate `> 0`, so `bucket[0]` parses 
cleanly and only fails later at runtime — same shape as pre-#1110 iceberg-go. 
Probably a separate small issue worth filing on cpp's side.
   
   Doesn't bite today — no real table is plausibly partitioned with 2–4 billion 
buckets. But rust is currently the only client that *could* author a spec in 
`(MaxInt32, MaxUint32]`, and such a spec would throw `NumberFormatException` in 
Java and fail to parse in iceberg-go post-#1110.
   
   If `u32` is intentional, a doc comment near `Bucket` / `Truncate` capturing 
the divergence would help future readers not read it as a parity gap. If it's 
incidental, narrowing to `i32` (or a `NonZero` newtype) would converge with the 
reference impl.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to