Re: [I] Bug(manifest): convertDecimalValue uses string length instead of declared precision for Avro fixed-size encoding [iceberg-go]

via GitHub Mon, 11 May 2026 21:00:51 -0700


mzzz-zzm commented on issue #1027:
URL: https://github.com/apache/iceberg-go/issues/1027#issuecomment-4427213619


   ## How the Java reference implementation handles this
   
   Confirmed by reading `apache/iceberg` Java source. The invariant is:
   
   > Precision is **always** sourced from the column's 
`DecimalType.precision()`,
   > never from the value. It flows schema → writer → fixed buffer.
   
   The pipeline:
   
   ### 1. Schema construction encodes precision directly
   
   `core/src/main/java/org/apache/iceberg/avro/TypeToSchema.java` (~line 268):
   
   ```java
   case DECIMAL:
       Types.DecimalType decimal = (Types.DecimalType) primitive;
       primitiveSchema =
           LogicalTypes.decimal(decimal.precision(), decimal.scale())
               .addToSchema(
                   Schema.createFixed(
                       "decimal_" + decimal.precision() + "_" + decimal.scale(),
                       null, null,
                       TypeUtil.decimalRequiredBytes(decimal.precision())));
   ```
   
   The Avro `fixed[N]` length is fixed at schema-build time to 
`decimalRequiredBytes(precision)`. Same `DecimalRequiredBytes` lookup table as 
in `iceberg-go`.
   
   ### 2. Writer is constructed from the schema's LogicalType
   
   `core/src/main/java/org/apache/iceberg/avro/BaseWriteBuilder.java` (~line 
92):
   
   ```java
   case "decimal":
       LogicalTypes.Decimal decimal = (LogicalTypes.Decimal) logicalType;
       return ValueWriters.decimal(decimal.getPrecision(), decimal.getScale());
   ```
   
   Precision is read off the *schema's logical type*, not the value.
   
   ### 3. Writer pre-allocates the fixed buffer once, off precision
   
   `core/src/main/java/org/apache/iceberg/avro/ValueWriters.java` (~line 367):
   
   ```java
   private static class DecimalWriter implements ValueWriter<BigDecimal> {
       private final int precision;
       private final int scale;
       private final ThreadLocal<byte[]> bytes;
   
       private DecimalWriter(int precision, int scale) {
           this.precision = precision;
           this.scale = scale;
           this.bytes =
               ThreadLocal.withInitial(() -> new 
byte[TypeUtil.decimalRequiredBytes(precision)]);
       }
   
       @Override
       public void write(BigDecimal decimal, Encoder encoder) throws 
IOException {
           encoder.writeFixed(
               DecimalUtil.toReusedFixLengthBytes(precision, scale, decimal, 
bytes.get()));
       }
   }
   ```
   
   ### 4. The actual byte conversion validates and pads from the precision
   
   `core/src/main/java/org/apache/iceberg/util/DecimalUtil.java` (~line 31):
   
   ```java
   public static byte[] toReusedFixLengthBytes(
           int precision, int scale, BigDecimal decimal, byte[] reuseBuf) {
       Preconditions.checkArgument(
           decimal.scale() == scale,
           "Cannot write value as decimal(%s,%s), wrong scale: %s", precision, 
scale, decimal);
       Preconditions.checkArgument(
           decimal.precision() <= precision,
           "Cannot write value as decimal(%s,%s), too large: %s", precision, 
scale, decimal);
   
       byte[] unscaled = decimal.unscaledValue().toByteArray();
       if (unscaled.length == reuseBuf.length) {
           return unscaled;
       }
   
       byte fillByte = (byte) (decimal.signum() < 0 ? 0xFF : 0x00);
       int offset = reuseBuf.length - unscaled.length;
       for (int i = 0; i < reuseBuf.length; i += 1) {
           reuseBuf[i] = (i < offset) ? fillByte : unscaled[i - offset];
       }
       return reuseBuf;
   }
   ```
   
   Note two write-time invariants that `iceberg-go` does **not** enforce today:
   
   - `decimal.scale() == scale` — the value's scale must match the column's.
   - `decimal.precision() <= precision` — the value must fit.
   
   And the sign-extending pad (`0xFF` for negative, `0x00` for non-negative) is 
also worth comparing against `iceberg-go`'s `padOrTruncateBytes`, which always 
pads with `0x00`. That is a separate latent bug for negative decimals near a 
byte boundary, though it may be moot if `MarshalBinary` already returns 
sign-extended bytes (PR #745 era).
   
   ### Mapping to the Go fix
   
   The Java approach maps cleanly onto the suggested `iceberg-go` fix:
   
   | Java                                                                | Go 
equivalent                                                        |
   
|---------------------------------------------------------------------|----------------------------------------------------------------------|
   | `TypeToSchema` encodes `decimalRequiredBytes(precision)` in schema  | 
`internal.DecimalNode(precision, scale)` already does this           |
   | `BaseWriteBuilder` reads precision off the schema's logical type    | 
`getFieldIDMap` needs to populate `fieldIDToPrecision` from `typ.Precision` |
   | `DecimalWriter` carries `precision` per field                       | 
`convertDecimalValue` should take `precision` as a parameter         |
   | `toReusedFixLengthBytes(precision, scale, decimal, buf)` validates and 
sign-extends | Go `padOrTruncateBytes` should size off `DRB(precision)` and 
sign-extend for negatives |
   
   The Go bug exists because `convertDecimalValue` was the only step in the 
pipeline that lost track of the per-field precision — the schema already had it 
(`DecimalNode` builds the fixed size from precision), but the runtime value 
serializer fell back to `len(dec.String())` because no precision was plumbed 
through. The Java code shows the correct shape: precision is captured once when 
the writer is constructed from the schema and reused for every value.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Bug(manifest): convertDecimalValue uses string length instead of declared precision for Avro fixed-size encoding [iceberg-go]

Reply via email to