wgtmac commented on code in PR #45375:
URL: https://github.com/apache/arrow/pull/45375#discussion_r1962879360


##########
cpp/src/parquet/types.cc:
##########
@@ -1619,6 +1629,22 @@ class LogicalType::Impl::Float16 final : public 
LogicalType::Impl::Incompatible,
 
 GENERATE_MAKE(Float16)
 
+class LogicalType::Impl::Variant final : public 
LogicalType::Impl::Incompatible,
+                                         public 
LogicalType::Impl::SimpleApplicable {
+ public:
+  friend class VariantLogicalType;
+
+  OVERRIDE_TOSTRING(Variant)
+  OVERRIDE_TOTHRIFT(VariantType, VARIANT)
+
+ private:
+  Variant()
+      : LogicalType::Impl(LogicalType::Type::VARIANT, SortOrder::UNKNOWN),
+        LogicalType::Impl::SimpleApplicable(parquet::Type::BYTE_ARRAY) {}

Review Comment:
   Sorry that I may have not explained it clearly.
   
   Parquet has two different kind of nodes: `primitive` and `group`. 
`primitive` node is usually for a primitive physical type (e.g. int64, double, 
binary, etc.) while `group` is for a complex type (e.g. struct, map, list, 
etc.). Whatever the node type is, it occupies a `SchemaElement` in the Parquet 
schema: 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L502
   
   `LogicalType` can be assigned to both primitive and group node. Most logical 
types annotate a primitive node, examples are DECIMAL, TIMESTAMP, etc. However, 
`List/Map/Variant` are logical types that must annotate a group node.
   
   For `List` and `Map`, their group structures are dynamic because of 
different subtypes (the element type of a List or the key/value types of a 
Map). For `Variant`, its group structure is also dynamic depending on whether 
it is shredded and shredded value types.
   
   > I updated the thrift definition of variant to include required metadata 
and value binary members
   
   Based on above explanation, we cannot modify `parquet.thrift`. To facilitate 
the current development, maybe we can define a `VariantExtensionType` similar 
to implementations at 
https://github.com/apache/arrow/tree/01e3f1e6829d6fcc9021ac47aebb6350590ca134/cpp/src/arrow/extension
 with a storage type of `struct<metadata:binary,value:binary>`. Once stable, we 
can make it canonical following the procedure at 
https://arrow.apache.org/docs/format/CanonicalExtensions.html 
   
   Do you have any opinion? @emkornfield @pitrou @mapleFU 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to