q8webmaster opened a new pull request, #8238:
URL: https://github.com/apache/paimon/pull/8238

   ## Problem
   
   After PR #8230 is applied, `TIMESTAMP(n <= 3)` columns are stored as 
epoch-microseconds under a `MICROS` Parquet annotation. The reader in 
`ParquetTimestampVector.getTimestamp()` for `precision <= 3` still calls 
`Timestamp.fromEpochMillis()`, treating the stored epoch-microsecond value as 
epoch-milliseconds — 1000× too large.
   
   The practical consequences are:
   
   - Any query that reads a low-precision timestamp column returns a year 
around 58xxx instead of the correct value.
   - For values near `Long.MAX_VALUE / 1000`, the multiplication inside 
`Timestamp.toMicros()` overflows a Java long, producing `ArithmeticException: 
Millis overflow: <negative long>`.
   
   Both break all readers that consume these columns — Paimon's own vectorised 
reader, paimon-trino, paimon-spark, and the compact operator (which reads 
sequence fields such as `op_ts`).
   
   ## Root cause
   
   The `precision <= 3` and `precision <= 6` branches in `getTimestamp()` both 
decode INT64 values via `LongColumnVector`, but they use different factory 
methods:
   
   | Branch | Method | Correct for |
   |---|---|---|
   | `precision <= 3` | `Timestamp.fromEpochMillis` | MILLIS-annotated files 
(pre-#8230) |
   | `precision <= 6` | `Timestamp.fromMicros` | MICROS-annotated files ✅ |
   
   After #8230, `precision <= 3` files carry a `MICROS` annotation and store 
epoch-microseconds — the same encoding as `precision <= 6`. The reader was 
never updated to match.
   
   ## Fix
   
   Change the `precision <= 3` branch to call `Timestamp.fromMicros()`, making 
it consistent with the `precision <= 6` branch:
   
   ```java
   if (precision <= 3 && vector instanceof LongColumnVector) {
   -   return Timestamp.fromEpochMillis(((LongColumnVector) vector).getLong(i));
   +   return Timestamp.fromMicros(((LongColumnVector) vector).getLong(i));
   ```
   
   ## Prior art
   
   PR #8230 fixed the writer path: `TIMESTAMP(n <= 3)` now emits a `MICROS` 
annotation and stores epoch-microseconds. This PR is the companion reader fix.
   
   PR #8231 fixed the same `fromEpochMillis` / `fromMicros` inconsistency in 
`IcebergConversions.timestampFromBytes` for precision-3 manifest bounds.
   
   ## Changes
   
   - `ParquetTimestampVector.java`: `precision <= 3` branch calls 
`fromMicros()` instead of `fromEpochMillis()`
   - `ParquetTimestampVectorTest.java`: three unit tests — low-precision MICROS 
decoding, mid-precision MICROS decoding (regression), and a test that documents 
the wrong-year result the bug produces


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to