fallintoplace opened a new issue, #1111:
URL: https://github.com/apache/iceberg-go/issues/1111
Partitioned writes currently derive partition keys from the input Arrow
record before `ToRequestedSchema` normalizes timestamp units to the table
schema.
The write path eventually casts Arrow `timestamp[s]` and `timestamp[ms]`
values to Iceberg microsecond timestamps before writing data files, but
partition routing happens earlier:
```go
partitions, err := p.getPartitions(record)
```
`getRecordPartitions` then calls `getArrowValueAsIcebergLiteral`, which
currently casts the raw Arrow timestamp value directly to an Iceberg
microsecond timestamp:
```go
case *array.Timestamp:
return iceberg.NewLiteral(iceberg.Timestamp(arr.Value(row))), nil
```
That ignores `arr.DataType().(*arrow.TimestampType).Unit`. For example, an
Arrow `timestamp[ms]` value of `1700000000000` represents milliseconds since
epoch, but this path treats it as microseconds while computing day/hour
partitions. The data values can later be written correctly after schema
projection, while the partition metadata/path was computed with the wrong unit.
This can route rows to the wrong year/month/day/hour partition and break
partition pruning for partitioned writes.
Proposed direction:
- keep accepting Arrow `timestamp[s]` and `timestamp[ms]`; existing
projection code supports casting them to Iceberg microseconds
- make partition-key extraction convert Arrow timestamp values according to
the Arrow unit and the Iceberg source field type
- return microsecond literals for `timestamp` / `timestamptz` sources
- return nanosecond literals for `timestamp_ns` / `timestamptz_ns` sources
- add regression coverage for partitioned fanout writes using
non-microsecond timestamp input
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]