pitrou commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1609894846


##########
src/main/thrift/parquet.thrift:
##########
@@ -1165,6 +1317,62 @@ struct FileMetaData {
   9: optional binary footer_signing_key_metadata
 }
 
+/** Metadata for a column in this file. */
+struct FileColumnMetadataV3 {
+  /** All column chunks in this file (one per row group) **/
+  1: required list<ColumnChunkV3> columns
+
+  /** Sort order used for the Statistics min_value and max_value fields
+   **/
+  2: optional ColumnOrder column_order;
+
+  /** NEW from v1: Optional key/value metadata for this column at the file 
level
+   **/
+  3: optional list<KeyValue> key_value_metadata

Review Comment:
   > Putting all user-defined metadata in a list is subject to limitations from 
thrift. That's why we have to care about its size.
   
   Ok, but is it a practical concern? In Parquet C++ we have:
   ```c++
   constexpr int32_t kDefaultThriftStringSizeLimit = 100 * 1000 * 1000;
   // Structs in the thrift definition are relatively large (at least 300 
bytes).
   // This limits total memory to the same order of magnitude as
   // kDefaultStringSizeLimit.
   constexpr int32_t kDefaultThriftContainerSizeLimit = 1000 * 1000;
   ```
   
   > For now, we can only choose a "black hole" from somewhere in the file and 
put its offset/length pair into the `key_value_metadata` if we want to add 
custom index.
   
   Well, you could also have a special-named column with 1 defined BYTE_ARRAY 
value for the piece of metadata you care about (or you could also model it more 
finely using Parquet types).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to