raunaqmorarka commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r1335484908
########## src/main/thrift/parquet.thrift: ########## @@ -216,7 +216,12 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count; /** - * Min and max values for the column, determined by its ColumnOrder. + * lower and upper bound values for the column, determined by its ColumnOrder. Review Comment: @wgtmac @gszadovszky could you please explain why the addition of a flag for truncation needs to happen together with the proposed change here ? I believe https://issues.apache.org/jira/browse/PARQUET-1685 already broke the current spec by allowing for truncation of min/max for strings here. Even after we add a flag to indicate truncation, no application can safely assume that the min/max stats in existing parquet files are not truncated for strings. So I don't get why the addition of that flag shouldn't go as a separate change to the spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org