JingsongLi commented on code in PR #8230:
URL: https://github.com/apache/paimon/pull/8230#discussion_r3419568562


##########
paimon-format/src/main/java/org/apache/parquet/filter2/predicate/ParquetFilters.java:
##########
@@ -299,11 +299,7 @@ private static Comparable<?> toParquetObject(
         } else if (value instanceof Timestamp) {
             Timestamp timestamp = (Timestamp) value;
             int precision = getTimestampPrecision(type);
-            if (precision <= 3) {
-                // milliseconds
-                return timestamp.getMillisecond();
-            } else if (precision <= 6) {
-                // microseconds
+            if (precision <= 6) {
                 return timestamp.toMicros();

Review Comment:
   [P1] This makes Parquet predicate pushdown use epoch micros for every 
`TIMESTAMP(0..3)` column, but existing files written before this change still 
store the same logical columns as `TIMESTAMP_MILLIS` with epoch-millis footer 
values. When a query with a timestamp predicate reads those old files, the 
Parquet row-group filter compares the micros literal against millis min/max 
values and can drop the row group entirely, so Paimon returns missing rows even 
though the normal reader can still decode the file. We need to either build the 
pushed filter from the file annotation (MILLIS vs MICROS) or disable timestamp 
pushdown for these low-precision columns to preserve backward compatibility.



##########
paimon-format/src/main/java/org/apache/paimon/format/parquet/ParquetSimpleStatsExtractor.java:
##########
@@ -223,8 +223,8 @@ private SimpleColStats toTimestampStats(Statistics<?> 
stats, int precision) {
         if (precision <= 3) {
             LongStatistics longStats = (LongStatistics) stats;
             return new SimpleColStats(
-                    Timestamp.fromEpochMillis(longStats.getMin()),
-                    Timestamp.fromEpochMillis(longStats.getMax()),
+                    Timestamp.fromMicros(longStats.getMin()),

Review Comment:
   [P2] The stats extractor also needs to decode based on the Parquet file 
annotation, not only the Paimon field precision. Old Paimon files with 
`TIMESTAMP(0..3)` have `TIMESTAMP_MILLIS` footer stats, so reading them with 
`fromMicros` turns their min/max into values about 1000x too small. This 
affects any path that re-extracts stats from existing Parquet files, such as 
migration/metadata rebuild workflows. Since `stats.type()` carries the 
primitive logical annotation, can we branch on its `TimeUnit` here the same way 
the vector reader does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to