mzzz-zzm commented on issue #1027:
URL: https://github.com/apache/iceberg-go/issues/1027#issuecomment-4427213619
## How the Java reference implementation handles this
Confirmed by reading `apache/iceberg` Java source. The invariant is:
> Precision is **always** sourced from the column's
`DecimalType.precision()`,
> never from the value. It flows schema → writer → fixed buffer.
The pipeline:
### 1. Schema construction encodes precision directly
`core/src/main/java/org/apache/iceberg/avro/TypeToSchema.java` (~line 268):
```java
case DECIMAL:
Types.DecimalType decimal = (Types.DecimalType) primitive;
primitiveSchema =
LogicalTypes.decimal(decimal.precision(), decimal.scale())
.addToSchema(
Schema.createFixed(
"decimal_" + decimal.precision() + "_" + decimal.scale(),
null, null,
TypeUtil.decimalRequiredBytes(decimal.precision())));
```
The Avro `fixed[N]` length is fixed at schema-build time to
`decimalRequiredBytes(precision)`. Same `DecimalRequiredBytes` lookup table as
in `iceberg-go`.
### 2. Writer is constructed from the schema's LogicalType
`core/src/main/java/org/apache/iceberg/avro/BaseWriteBuilder.java` (~line
92):
```java
case "decimal":
LogicalTypes.Decimal decimal = (LogicalTypes.Decimal) logicalType;
return ValueWriters.decimal(decimal.getPrecision(), decimal.getScale());
```
Precision is read off the *schema's logical type*, not the value.
### 3. Writer pre-allocates the fixed buffer once, off precision
`core/src/main/java/org/apache/iceberg/avro/ValueWriters.java` (~line 367):
```java
private static class DecimalWriter implements ValueWriter<BigDecimal> {
private final int precision;
private final int scale;
private final ThreadLocal<byte[]> bytes;
private DecimalWriter(int precision, int scale) {
this.precision = precision;
this.scale = scale;
this.bytes =
ThreadLocal.withInitial(() -> new
byte[TypeUtil.decimalRequiredBytes(precision)]);
}
@Override
public void write(BigDecimal decimal, Encoder encoder) throws
IOException {
encoder.writeFixed(
DecimalUtil.toReusedFixLengthBytes(precision, scale, decimal,
bytes.get()));
}
}
```
### 4. The actual byte conversion validates and pads from the precision
`core/src/main/java/org/apache/iceberg/util/DecimalUtil.java` (~line 31):
```java
public static byte[] toReusedFixLengthBytes(
int precision, int scale, BigDecimal decimal, byte[] reuseBuf) {
Preconditions.checkArgument(
decimal.scale() == scale,
"Cannot write value as decimal(%s,%s), wrong scale: %s", precision,
scale, decimal);
Preconditions.checkArgument(
decimal.precision() <= precision,
"Cannot write value as decimal(%s,%s), too large: %s", precision,
scale, decimal);
byte[] unscaled = decimal.unscaledValue().toByteArray();
if (unscaled.length == reuseBuf.length) {
return unscaled;
}
byte fillByte = (byte) (decimal.signum() < 0 ? 0xFF : 0x00);
int offset = reuseBuf.length - unscaled.length;
for (int i = 0; i < reuseBuf.length; i += 1) {
reuseBuf[i] = (i < offset) ? fillByte : unscaled[i - offset];
}
return reuseBuf;
}
```
Note two write-time invariants that `iceberg-go` does **not** enforce today:
- `decimal.scale() == scale` — the value's scale must match the column's.
- `decimal.precision() <= precision` — the value must fit.
And the sign-extending pad (`0xFF` for negative, `0x00` for non-negative) is
also worth comparing against `iceberg-go`'s `padOrTruncateBytes`, which always
pads with `0x00`. That is a separate latent bug for negative decimals near a
byte boundary, though it may be moot if `MarshalBinary` already returns
sign-extended bytes (PR #745 era).
### Mapping to the Go fix
The Java approach maps cleanly onto the suggested `iceberg-go` fix:
| Java | Go
equivalent |
|---------------------------------------------------------------------|----------------------------------------------------------------------|
| `TypeToSchema` encodes `decimalRequiredBytes(precision)` in schema |
`internal.DecimalNode(precision, scale)` already does this |
| `BaseWriteBuilder` reads precision off the schema's logical type |
`getFieldIDMap` needs to populate `fieldIDToPrecision` from `typ.Precision` |
| `DecimalWriter` carries `precision` per field |
`convertDecimalValue` should take `precision` as a parameter |
| `toReusedFixLengthBytes(precision, scale, decimal, buf)` validates and
sign-extends | Go `padOrTruncateBytes` should size off `DRB(precision)` and
sign-extend for negatives |
The Go bug exists because `convertDecimalValue` was the only step in the
pipeline that lost track of the per-field precision — the schema already had it
(`DecimalNode` builds the fixed size from precision), but the runtime value
serializer fell back to `len(dec.String())` because no precision was plumbed
through. The Java code shows the correct shape: precision is captured once when
the writer is constructed from the schema and reused for every value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]