mapleFU commented on code in PR #197:
URL: https://github.com/apache/parquet-format/pull/197#discussion_r1149338204


##########
src/main/thrift/parquet.thrift:
##########
@@ -190,6 +190,35 @@ enum FieldRepetitionType {
   /** The field is repeated and can contain 0 or more values */
   REPEATED = 2;
 }
+/**
+ * A structure for capturing metadata for estimating the unencoded, 
uncompressed size
+ * of data.
+ */ 
+struct SizeEstimationStatistics {
+   /** 
+    * The number of logic bytes needed to store present/non-null values.
+    * Unless specified below, the computed size is the size it would take to 
plain-encode the underlying
+    * physical type.
+    * Special calculations:
+    *  - Enum: plain-encoded BYTE_ARRAY size
+    *  - Integers (same size used for signed and unsigned): int8 - 1 bytes, 
int16 - 2 
+    *  - Decimal - Each value is assumed to take the minimal number of bytes 
necessary to encode

Review Comment:
   Seems that small Decimal can be encoded as FLBA or BYTE_ARRAY, but big 
decimal cannot be stored as i32. Should we force use the physical type or 
related with physical type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to