q8webmaster opened a new pull request, #8230:
URL: https://github.com/apache/paimon/pull/8230

   ## What does this PR do?
   
   Paimon currently emits a Parquet `TIMESTAMP(isAdjustedToUTC, unit=MILLIS)` 
annotation for `TIMESTAMP` and `TIMESTAMP_WITH_LOCAL_TIME_ZONE` columns with 
precision ≤ 3.
   
   The [Iceberg v2 specification](https://iceberg.apache.org/spec/#parquet) 
only permits `INT64 MICROS` for the `timestamp` and `timestamptz` logical 
types. `MILLIS` is only valid under Iceberg v3.
   
   This causes Iceberg-aware query engines to reject Paimon-written Parquet 
files for any such column with a schema compatibility error similar to:
   
   ```
   Field ts's type INT64 in parquet file … is incompatible with type
   timestamp(6) with time zone defined in table schema
   ```
   
   ## Root cause
   
   `ParquetSchemaConverter.createTimestampWithLogicalType` selects 
`TimeUnit.MILLIS` for `precision <= 3` and `TimeUnit.MICROS` for `3 < precision 
<= 6`. The Iceberg v2 spec does not allow `MILLIS`; only `MICROS` and `NANOS` 
are valid for timestamp columns.
   
   ## Changes
   
   | File | Change |
   |---|---|
   | `ParquetSchemaConverter.java` | Emit `MICROS` annotation for `precision <= 
3` instead of `MILLIS` |
   | `ParquetRowDataWriter.java` | `TimestampMillsWriter.writeTimestamp`: call 
`value.toMicros()` (= `millisecond × 1000`) so the stored `INT64` matches the 
`MICROS` annotation unit |
   
   The **reader path** (`MILLIS → precision=3`, `MICROS → precision=6`) is 
intentionally left unchanged so that Parquet files written by older Paimon 
versions remain readable without error.
   
   ## Backward compatibility
   
   - **Reading old files** (MILLIS annotation): unaffected — the reader maps 
`MILLIS → precision=3` as before.
   - **Writing new files**: `TIMESTAMP(n≤3)` columns now produce 
`MICROS`-annotated `INT64` values (`millisecond × 1000`).
   - **Mixed tables**: a table whose files span a Paimon upgrade will contain 
both MILLIS (old) and MICROS (new) files. Both remain readable, but 
Iceberg-aware engines that enforce strict annotation checking on **all** files 
may still reject the old files until the table is rebuilt. Affected tables 
should be rebuilt from scratch after upgrading.
   - **Schema round-trip**: `TIMESTAMP(3)` written then read back by Paimon 
will return `TIMESTAMP(6)` (since the MICROS annotation maps to precision 6 in 
the reader). The `testPaimonParquetSchemaConvert` test is updated to reflect 
this.
   
   ## Tests
   
   - `testLowPrecisionTimestampUseMicrosAnnotation` — verifies that 
`createTimestampWithLogicalType` produces a `MICROS` annotation for all 
precision values 0–3, for both `TIMESTAMP` and `TIMESTAMP_WITH_LOCAL_TIME_ZONE`.
   - `testPaimonParquetSchemaConvert` — updated expected round-trip result to 
account for `TIMESTAMP(3)` reading back as `TIMESTAMP(6)`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to