[ 
https://issues.apache.org/jira/browse/PARQUET-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317207#comment-17317207
 ] 

ASF GitHub Bot commented on PARQUET-2016:
-----------------------------------------

pitrou commented on a change in pull request #173:
URL: https://github.com/apache/parquet-format/pull/173#discussion_r609714187



##########
File path: src/main/thrift/parquet.thrift
##########
@@ -1024,17 +1025,20 @@ struct FileMetaData {
   6: optional string created_by
 
   /**
-   * Sort order used for the min_value and max_value fields of each column in
-   * this file. Sort orders are listed in the order matching the columns in the
-   * schema. The indexes are not necessary the same though, because only leaf
-   * nodes of the schema are represented in the list of sort orders.
+   * Sort order used for the min_value and max_value fields in the Statistics
+   * objects and the min_values and max_values fields in the ColumnIndex
+   * objects of each column in this file. Sort orders are listed in the order
+   * matching the columns in the schema. The indexes are not necessary the same
+   * though, because only leaf nodes of the schema are represented in the list
+   * of sort orders.
    *
-   * Without column_orders, the meaning of the min_value and max_value fields 
is
-   * undefined. To ensure well-defined behaviour, if min_value and max_value 
are
-   * written to a Parquet file, column_orders must be written as well.
+   * Without column_orders, the meaning of the min_value and max_value fields
+   * in the Statistics object and the ColumnIndex object is undefined. To 
ensure
+   * well-defined behaviour, these fields are written to a Parquet file,

Review comment:
       "if these fields" perhaps?

##########
File path: src/main/thrift/parquet.thrift
##########
@@ -941,13 +941,14 @@ struct ColumnIndex {
   1: required list<bool> null_pages
 
   /**
-   * Two lists containing lower and upper bounds for the values of each page.
-   * These may be the actual minimum and maximum values found on a page, but
-   * can also be (more compact) values that do not exist on a page. For
-   * example, instead of storing ""Blart Versenwald III", a writer may set
-   * min_values[i]="B", max_values[i]="C". Such more compact values must still
-   * be valid values within the column's logical type. Readers must make sure
-   * that list entries are populated before using them by inspecting 
null_pages.
+   * Two lists containing lower and upper bounds for the values of each page
+   * determined by the ColumnOrder of the column. These may be the actual
+   * minimum and maximum values found on a page, but can also be (more compact)
+   * values that do not exist on a page. For example, instead of storing 
""Blart
+   * Versenwald III", a writer may set min_values[i]="B", max_values[i]="C".

Review comment:
       A bit out of scope, but is there a least of types for which such 
shortening is allowed? Or are writers allowed to store shortened statictics 
even for non-string types such as integers, timestamps...?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Reference column_order field from column indexes
> ------------------------------------------------
>
>                 Key: PARQUET-2016
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2016
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>
> We have created the field column_order to specify the ordering of a primitive 
> type. This is used for the row group level statistics but we never referenced 
> this from the column indexes feature while in both cases we heavily rely on 
> the ordering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to