q8webmaster opened a new pull request, #8238:
URL: https://github.com/apache/paimon/pull/8238
## Problem
After PR #8230 is applied, `TIMESTAMP(n <= 3)` columns are stored as
epoch-microseconds under a `MICROS` Parquet annotation. The reader in
`ParquetTimestampVector.getTimestamp()` for `precision <= 3` still calls
`Timestamp.fromEpochMillis()`, treating the stored epoch-microsecond value as
epoch-milliseconds — 1000× too large.
The practical consequences are:
- Any query that reads a low-precision timestamp column returns a year
around 58xxx instead of the correct value.
- For values near `Long.MAX_VALUE / 1000`, the multiplication inside
`Timestamp.toMicros()` overflows a Java long, producing `ArithmeticException:
Millis overflow: <negative long>`.
Both break all readers that consume these columns — Paimon's own vectorised
reader, paimon-trino, paimon-spark, and the compact operator (which reads
sequence fields such as `op_ts`).
## Root cause
The `precision <= 3` and `precision <= 6` branches in `getTimestamp()` both
decode INT64 values via `LongColumnVector`, but they use different factory
methods:
| Branch | Method | Correct for |
|---|---|---|
| `precision <= 3` | `Timestamp.fromEpochMillis` | MILLIS-annotated files
(pre-#8230) |
| `precision <= 6` | `Timestamp.fromMicros` | MICROS-annotated files ✅ |
After #8230, `precision <= 3` files carry a `MICROS` annotation and store
epoch-microseconds — the same encoding as `precision <= 6`. The reader was
never updated to match.
## Fix
Change the `precision <= 3` branch to call `Timestamp.fromMicros()`, making
it consistent with the `precision <= 6` branch:
```java
if (precision <= 3 && vector instanceof LongColumnVector) {
- return Timestamp.fromEpochMillis(((LongColumnVector) vector).getLong(i));
+ return Timestamp.fromMicros(((LongColumnVector) vector).getLong(i));
```
## Prior art
PR #8230 fixed the writer path: `TIMESTAMP(n <= 3)` now emits a `MICROS`
annotation and stores epoch-microseconds. This PR is the companion reader fix.
PR #8231 fixed the same `fromEpochMillis` / `fromMicros` inconsistency in
`IcebergConversions.timestampFromBytes` for precision-3 manifest bounds.
## Changes
- `ParquetTimestampVector.java`: `precision <= 3` branch calls
`fromMicros()` instead of `fromEpochMillis()`
- `ParquetTimestampVectorTest.java`: three unit tests — low-precision MICROS
decoding, mid-precision MICROS decoding (regression), and a test that documents
the wrong-year result the bug produces
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]