Thanks Gang for initiating the discussion in the parquet community. Thanks Gene for bringing it up here.
We're more than happy to assist with anything needed for the migration. Please let us know how we can help! Yufei On Mon, Aug 26, 2024 at 2:35 AM Gang Wu <ust...@gmail.com> wrote: > Hi, > > There is a relevant discussion in the dev@parquet: > https://lists.apache.org/thread/6h58hj39lhqtcyd2hlsyvqm4lzdh4b9z > > The feedback looks promising. Looking forward to cooperating with the > Spark community! > > Best regards, > Gang > > On Thu, Aug 22, 2024 at 10:20 PM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> +1 on moving this to the Parquet project/community (assuming that the >> Parquet community is ok with this) >> >> On Thu, Aug 22, 2024 at 3:02 AM Chao Sun <sunc...@apache.org> wrote: >> >>> +1 too >>> >>> On Wed, Aug 21, 2024 at 4:43 PM huaxin gao <huaxin.ga...@gmail.com> >>> wrote: >>> >>>> +1 for moving variant type to Parquet, as it promotes standardization >>>> and interoperability across numerous projects. >>>> >>>> Huaxin >>>> >>>> On Wed, Aug 21, 2024 at 1:28 PM Yufei Gu <flyrain...@gmail.com> wrote: >>>> >>>>> Agreed that Parquet would be a good place to host the new type. >>>>> Different table formats, like Iceberg and Delta can benefit from it as >>>>> they >>>>> have based on parquet already. >>>>> >>>>> Yufei >>>>> >>>>> >>>>> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos >>>>> <alkis.evlogime...@databricks.com.invalid> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> In addition to everything said above, it is also a great opportunity >>>>>> for wider testing and possibly tweaking the spec before it takes off post >>>>>> standardization. >>>>>> >>>>>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer < >>>>>> russell.spit...@gmail.com> wrote: >>>>>> >>>>>>> I think this would be a great move to encourage all sorts of engines >>>>>>> and table formats to take advantage of variant type and make sure it >>>>>>> remains compatible between all those systems. >>>>>>> >>>>>>> I strongly support this, >>>>>>> Russ >>>>>>> >>>>>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hey everyone, >>>>>>>> >>>>>>>> I agree the Parquet project is a good place to host and evolve the >>>>>>>> spec (we could store it in parquet-variant?). We would need to align >>>>>>>> this >>>>>>>> with the Parquet project. Anyway, I'm familiar both with Iceberg and >>>>>>>> Parquet and happy to help where needed. >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Fokko >>>>>>>> >>>>>>>> >>>>>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >>>>>>>> <r...@databricks.com.invalid>: >>>>>>>> >>>>>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up >>>>>>>>> with two or even more diverging specs for storing variants. It just >>>>>>>>> adds >>>>>>>>> more work for everybody to interop. Parquet would be a great home for >>>>>>>>> this >>>>>>>>> spec as a neutral project that almost all the other important >>>>>>>>> projects in >>>>>>>>> this space depend on as the de facto standard for physical data >>>>>>>>> encoding >>>>>>>>> and storage. So if we can collaborate with the Parquet community and >>>>>>>>> get >>>>>>>>> this into Parquet to avoid each project building its own spec, that'd >>>>>>>>> be >>>>>>>>> amazing. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I am one of the main developers implementing Variant in Spark. >>>>>>>>>> The specification and all the code are currently merged into the >>>>>>>>>> common/variant >>>>>>>>>> <https://github.com/apache/spark/tree/master/common/variant> >>>>>>>>>> package in the Spark repo. >>>>>>>>>> >>>>>>>>>> There has been growing interest from other projects (such as >>>>>>>>>> Iceberg) in supporting Variant, and we think that moving the Variant >>>>>>>>>> spec >>>>>>>>>> and implementation out to a new home might be the best way for all >>>>>>>>>> the >>>>>>>>>> different projects to be able to use and collaborate on Variant. We >>>>>>>>>> originally put all the Variant code under common/variant with >>>>>>>>>> the expectation that eventually it would be moved elsewhere. >>>>>>>>>> >>>>>>>>>> We are proposing that we move the Variant spec and implementation >>>>>>>>>> out of the Spark project, to the Parquet project. Spark depends >>>>>>>>>> heavily on >>>>>>>>>> Parquet, and the Variant spec contains a lot of details on the >>>>>>>>>> physical >>>>>>>>>> storage layer, such as shredding. The Parquet project would be a >>>>>>>>>> great >>>>>>>>>> place to standardize the Variant data type, and to enable >>>>>>>>>> interoperability >>>>>>>>>> across many different projects. However, even when we move Variant >>>>>>>>>> out, we >>>>>>>>>> expect to retain the compatibility with the current Spark >>>>>>>>>> implementation. >>>>>>>>>> >>>>>>>>>> What do people think? There are probably many details we still >>>>>>>>>> need to figure out in terms of moving the implementation, but at a >>>>>>>>>> high-level, does it make sense to move Variant to Parquet? >>>>>>>>>> >>>>>>>>>> I appreciate your feedback! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Gene >>>>>>>>>> >>>>>>>>>