alamb commented on issue #6736: URL: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2781556103
I have some news I would like to share here -- it seems that @PinkCrow007 has actually been working on a variant implementation in parquet (including support in arrow-rs as an extension type) Here is an update from [Martin Prammer](https://www.cs.cmu.edu/~mprammer/) (not sure if he has a github handle) > We've made progress towards implementing the Variant type in both Parquet_rs and Arrow_rs and have prepared a document, [shared as a Google doc](https://docs.google.com/document/d/1Wv6FOFsZGibdd9hscorzXx0PwAAADQrQHTn36cjJKHA/edit?tab=t.0), that details the overall project and our current status. In summary, our current prototypes are focused on round-tripping binary data between Parquet and Arrow. The Arrow-side Variant is implemented as a CanonicalExtensionType, while the Parquet-side Variant is a LogicalType. If you're interested in looking at the code early, [Jiaying's fork](https://github.com/apache/arrow-rs/compare/main...PinkCrow007:arrow-rs:variant-clean) is publically available. Our next goal is to implement binary data decoding/encoding to facilitate using Variants as a stand-alone type, which will then allow us to implement Variant shredding. While there's still work to do before we have the basic functionality for a Variant type, we plan to PR the baseline variant and then address shredding. > > At this point, it would be helpful for our team to connect to the broader Apache ecosystem's discussion on Variants; Jiaying has already joined the Arrow discord, and we're both happy to join any relevant mailing lists. We're also soliciting existing Variant implementations that we can use to verify our library against. It seem they also need some example variant data to make faster progress. I will go beg some more from the parquet mailing list It is very exciting to see the momentum picking up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org