Thanks for all the replies. Since I opened the topic of specification , let me come with a proposal that I'm looking forward to iterate upon with the help of the community
Laurent On Wed, Jun 11, 2025 at 4:12 PM yun zou <yunzou.colost...@gmail.com> wrote: > Yes. I think we have agreed that we will make sure things are described > clearly in both the spec and website for > the critical fields added. > > We are currently trying to get a webpage out for the Generic Table support > in Polaris. > > Best Regards, > Yun > > On Wed, Jun 11, 2025 at 3:09 PM Dmitri Bourlatchkov <di...@apache.org> > wrote: > > > As for the evolution, I do think it is a good strage to evolve step by > > step, instead of trying to standardize > > everything in one shot. > > > > > > This approach makes sense to me, but we need to be explicit about it in > the > > spec. > > > > Cheers, > > Dmitri. > > > > On Wed, Jun 11, 2025 at 5:45 PM yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > > > I mean a doc page similar to [1] that explains what Generic Tables > are, > > > how > > > to use them in Spark, how to use them is some other query engine, and > > most > > > importantly the planned evolution for the Generic Tables API and > > > specification. > > > > > > Yes, we can definitely add a webpage to describe the current guarantee > of > > > generic table > > > support, and we can mention that it is currently a beta version. I am > > > currently working on this. > > > > > > > > > As for the evolution, I do think it is a good strage to evolve step by > > > step, instead of trying to standardize > > > everything in one shot. > > > As we have mentioned during design discussion, standardization of some > > > fields across different > > > engines and different formats are very challenging, such as schema > where > > > different engines support different > > > data types. So we will need more thoughts when adding those fields, > the > > > base location is just one of the easy fields. > > > > > > > > > Best Regards, > > > Yun > > > > > > > > > > > > > > > On Wed, Jun 11, 2025 at 2:17 PM Dmitri Bourlatchkov <di...@apache.org> > > > wrote: > > > > > > > > Can you explain what is a proper plain English spec for this > feature? > > > > > > > > I mean a doc page similar to [1] that explains what Generic Tables > are, > > > how > > > > to use them in Spark, how to use them is some other query engine, and > > > most > > > > importantly the planned evolution for the Generic Tables API and > > > > specification. > > > > > > > > IMHO, given this discussion thread, we can only offer a "beta" in > 1.0. > > > > Meaning the spec and API are subject to change without backward > > > > compatibility guarantees. > > > > > > > > [1] https://polaris.apache.org/in-dev/unreleased/policy/ > > > > > > > > Cheers, > > > > Dmitri. > > > > > > > > On Wed, Jun 11, 2025 at 4:46 PM Yufei Gu <flyrain...@gmail.com> > wrote: > > > > > > > > > There are solid use cases for adding generic-table support with the > > > Spark > > > > > plugin: > > > > > > > > > > - Single Catalog, Many Formats – Keep Delta, CSV, Parquet (and > > > future > > > > > formats) side-by-side in one place instead of juggling separate > > > > > catalogs. > > > > > - Seamless Migrations – Let teams move data from one format to > > > another > > > > > without breaking queries or governance workflows. > > > > > > > > > > Happy to brainstorm more improvements and next steps! > > > > > > > > > > Now that [1543] is merged and adds some concrete specialization to > > > > Generic > > > > > > Tables API, I believe it is even more important to make a proper > > > plain > > > > > > English spec for this feature before 1.0. > > > > > > > > > > We've cut the branch for 1.0 release already, and PR 1543 won't be > a > > > part > > > > > of 1.0 release. Can you explain what is a proper plain > > > > > English spec for this feature? I am glad to review it if you > propose > > > one. > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > On Wed, Jun 11, 2025 at 11:53 AM Dmitri Bourlatchkov < > > di...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > Thanks, Laurent, for bringing up spec "readiness" and, I guess, > by > > > > > > extension backward compatibility concerns. > > > > > > > > > > > > Regardless of how deep current spec is in Polaris, I believe it > is > > > > > > important to have it written down as an artifact in the Polaris > > > repo. I > > > > > > know we had a design doc at some point, but the project is > defined > > by > > > > > what > > > > > > is in the repository, plus discussion docs can quickly get out of > > > sync > > > > > with > > > > > > actual code. I believe I raised this point before. > > > > > > > > > > > > The API change merged under [1543] is not sufficient to inform > > users > > > of > > > > > > Polaris about the Generic Tables feature. I tend to regard > comments > > > in > > > > > Open > > > > > > API yaml files as similar to javadoc. They are good for > developers > > > > > working > > > > > > with that specific aspect of the system, but do not provide a > > > holistic > > > > > > view. > > > > > > > > > > > > Now that [1543] is merged and adds some concrete specialization > to > > > > > Generic > > > > > > Tables API, I believe it is even more important to make a proper > > > plain > > > > > > English spec for this feature before 1.0. > > > > > > > > > > > > [1543] https://github.com/apache/polaris/pull/1543 > > > > > > > > > > > > Cheers, > > > > > > Dmitri. > > > > > > > > > > > > On Wed, Jun 11, 2025 at 10:56 AM Laurent Goujon > > > > > <laur...@dremio.com.invalid > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > What I was trying to say is that i'm sure there's plenty of > value > > > for > > > > > > > spark, but in it's current state the value is little from a > > Polaris > > > > > point > > > > > > > of view as an open catalog service? > > > > > > > > > > > > > > Of course we can follow-up on that but is the current spec > still > > > > > > considered > > > > > > > wip or when 1.0 will be released, we would have to keep > > supporting > > > it > > > > > > even > > > > > > > if we come up with something more comprehensive? > > > > > > > > > > > > > > On Wed, Jun 11, 2025, 00:22 Eric Maynard < > > eric.w.mayn...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > I don't think there's a lot of value where the > specification > > > of a > > > > > > table > > > > > > > > format is left to the client > > > > > > > > Considering that you currently can use non-Iceberg tables in > > > > Polaris > > > > > > with > > > > > > > > the Spark client and it works end-to-end, I'd have a hard > time > > > > > agreeing > > > > > > > > that there is no value. > > > > > > > > > > > > > > > > But I think this discussion is maybe best moved to another > > > thread. > > > > > The > > > > > > > > incremental change to add a location may make sense for the > > > > existing > > > > > > > > generic table implementation, even if later we reach a > > consensus > > > to > > > > > rip > > > > > > > it > > > > > > > > out and replace it with something more "comprehensive". > > > > > > > > > > > > > > > > --EM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >