fallintoplace opened a new issue, #1111:
URL: https://github.com/apache/iceberg-go/issues/1111

   Partitioned writes currently derive partition keys from the input Arrow 
record before `ToRequestedSchema` normalizes timestamp units to the table 
schema.
   
   The write path eventually casts Arrow `timestamp[s]` and `timestamp[ms]` 
values to Iceberg microsecond timestamps before writing data files, but 
partition routing happens earlier:
   
   ```go
   partitions, err := p.getPartitions(record)
   ```
   
   `getRecordPartitions` then calls `getArrowValueAsIcebergLiteral`, which 
currently casts the raw Arrow timestamp value directly to an Iceberg 
microsecond timestamp:
   
   ```go
   case *array.Timestamp:
       return iceberg.NewLiteral(iceberg.Timestamp(arr.Value(row))), nil
   ```
   
   That ignores `arr.DataType().(*arrow.TimestampType).Unit`. For example, an 
Arrow `timestamp[ms]` value of `1700000000000` represents milliseconds since 
epoch, but this path treats it as microseconds while computing day/hour 
partitions. The data values can later be written correctly after schema 
projection, while the partition metadata/path was computed with the wrong unit.
   
   This can route rows to the wrong year/month/day/hour partition and break 
partition pruning for partitioned writes.
   
   Proposed direction:
   
   - keep accepting Arrow `timestamp[s]` and `timestamp[ms]`; existing 
projection code supports casting them to Iceberg microseconds
   - make partition-key extraction convert Arrow timestamp values according to 
the Arrow unit and the Iceberg source field type
   - return microsecond literals for `timestamp` / `timestamptz` sources
   - return nanosecond literals for `timestamp_ns` / `timestamptz_ns` sources
   - add regression coverage for partitioned fanout writes using 
non-microsecond timestamp input
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to