kbuci commented on issue #18711: URL: https://github.com/apache/hudi/issues/18711#issuecomment-4411765009
Yes, I'd prefer that as well - passing around the Hoodie schema as the "authoritative source" to infer HUDI logical types, since thats more understandable/clean and any seems to be the precedent in HUDI spark. My only concern was figuring out if that breaks any public APIs, but I can assess that as I create the PR, and if so we can make sure to land any such PR in 1.3+ (and not in 1.2). I'm still famializing myself with other table formats, but based on a very rough/initial search it seems other table formats might also be leaning towards A approach in practice? ## Iceberg - **Iceberg Spec (v3, includes Variant type definition)**: https://iceberg.apache.org/spec - **Iceberg Schemas doc (field IDs, type system)**: https://iceberg.apache.org/docs/latest/schemas - **PR: Add variant type support to ParquetTypeVisitor**: https://github.com/apache/iceberg/pull/14588 - **PR: Implement Variant Parquet readers**: https://github.com/apache/iceberg/pull/12139 - **PR: Spec — add variant type**: https://github.com/apache/iceberg/pull/10831 - **Snowflake blog: Iceberg v3 Variant Type**: https://www.snowflake.com/en/engineering-blog/apache-iceberg-v3-variant-type/ ## Delta Lake - **Delta Variant Type RFC**: https://github.com/delta-io/delta/blob/master/protocol_rfcs/accepted/variant-type.md - **Delta Protocol (v4.2.0, schema in transaction log)**: https://github.com/delta-io/delta/blob/v4.2.0/PROTOCOL.md - **PR: Add VariantType support in Spark schema conversion**: https://github.com/delta-io/delta/pull/6164 - **PR: Kernel-level variant schema deserialization**: https://github.com/delta-io/delta/pull/3464 - **Delta Variant Shredding RFC**: https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md ## Paimon - **PIP-40: Introduce a new Vector data type**: https://cwiki.apache.org/confluence/display/PAIMON/PIP-40%3A+Introduce+a+new+Vector+data+type - **Issue: Introduce VecType**: https://github.com/apache/paimon/issues/7011 - **PR: Add Flink support for VectorType**: https://github.com/apache/paimon/pull/7238 - **Paimon FileFormat spec (Parquet type mapping)**: https://paimon.apache.org/docs/1.4/concepts/spec/fileformat/ - **Paimon ParquetSchemaConverter API**: https://paimon.apache.org/docs/0.9/api/java/org/apache/paimon/format/parquet/ParquetSchemaConverter.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
