raunaqmorarka commented on code in PR #216:
URL: https://github.com/apache/parquet-format/pull/216#discussion_r1335484908


##########
src/main/thrift/parquet.thrift:
##########
@@ -216,7 +216,12 @@ struct Statistics {
    /** count of distinct values occurring */
    4: optional i64 distinct_count;
    /**
-    * Min and max values for the column, determined by its ColumnOrder.
+    * lower and upper bound values for the column, determined by its 
ColumnOrder.

Review Comment:
   @wgtmac @gszadovszky could you please explain why the addition of a flag for 
truncation needs to happen together with the proposed change here ?
   I believe https://issues.apache.org/jira/browse/PARQUET-1685 already broke 
the current spec by allowing for truncation of min/max for strings here. Even 
after we add a flag to indicate truncation, no application can safely assume 
that the min/max stats in existing parquet files are not truncated for strings. 
So I don't get why the addition of that flag shouldn't go as a separate change 
to the spec.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to