emkornfield commented on code in PR #197:
URL: https://github.com/apache/parquet-format/pull/197#discussion_r1148481707


##########
src/main/thrift/parquet.thrift:
##########
@@ -223,6 +223,17 @@ struct Statistics {
     */
    5: optional binary max_value;
    6: optional binary min_value;
+   /** The number of bytes the row/group or page would take if encoded with 
plain-encoding */
+   7: optional i64 plain_encoded_bytes;

Review Comment:
   I'm open to either approach.  IIUC the suggestion here to change the name to 
something like:
   ```
   /** Optionally set.  But only  set for byte array columns to help 
applications determine total unencoded/uncompressed size of the page.
      * This is equivalent to PlainEncoding(values) - (num_values_encoded * 4) 
(i.e. it doesn't include the size
      * needed to record the lengths of the bytes) nor does it include any size 
to account for nulls.
      */
   encoded_byte_array_data_bytes
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to