Hi Laurent, Thanks a lot for the reference. Yes, you are right, the current generic table support is very general, there is no specific field to describe metadata information, schema etc. Most of the interpretation responsibility is on the Client, for example, we have shipped a Polaris Spark Client to help Spark to interact with the client.
The spec can definitely evolve to contain more standardized fields to help the cross engine sharing. Adding location is one step to help standardize the information for cross engine sharing since it is a critical sharing information. If there are other important fields/APIs we think could help improve the cross engine sharing, we can definitely open another thread to discuss that. Best Regards, Yun On Tue, Jun 10, 2025 at 6:53 PM Eric Maynard <eric.w.mayn...@gmail.com> wrote: > To be clear, this thread is about adding a field to the already-existing > type. It sounds like you’re advocating for adding even more fields? > > On Tue, Jun 10, 2025 at 5:59 PM Laurent Goujon <laur...@dremio.com.invalid > > > wrote: > > > I appreciate the sizable amount of effort being put here but isn't the > > generic table proposal as is it today too generic? Without some > description > > of the available types, without some metadata information (like the > > schema), without some protocol on how a client should actually interpret > > the content/interact with the table, aren't we fragmenting instead of > > unifying? > > > > Delta for example is working on some catalog protocol for instance: > > https://github.com/delta-io/delta/issues/4381 > > > > Laurent > > > > On Tue, Jun 10, 2025 at 5:01 PM Eric Maynard <eric.w.mayn...@gmail.com> > > wrote: > > > > > After this latest round of changes, it looks good to me too! Thanks for > > > working on this. > > > > > > On Tue, Jun 10, 2025 at 4:56 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > > > > > Sounds good to me. > > > > Yufei > > > > > > > > > > > > On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com> > > > > wrote: > > > > > > > > > Hi Team, > > > > > > > > > > Thanks a lot for all the valuable feedback! > > > > > > > > > > I want to bump this thread up and see if we can conclude on the > > > direction > > > > > to move on. > > > > > > > > > > For the V1 generic table spec, we would like to start with support > of > > > > > single location, and leave multiple location > > > > > support as an open discussion which could be introduced later. > > > > > > > > > > A new base-location field will be added to the generic table spec > > with > > > > the > > > > > following description: > > > > > - The base location is in URI format. > > > > > - The table base location is a location that includes all files for > > the > > > > > table. > > > > > - A table with multiple disjoint locations (i.e. containing files > > that > > > > are > > > > > outside the configured base location) is not compliant with the > > current > > > > > generic table support. > > > > > - If no location is provided, clients or users are responsible for > > > > > managing the location. > > > > > > > > > > We will also add a dedicated webpage for Polaris Generic Table to > > > > describe > > > > > all functionality and key fields clearly. > > > > > > > > > > If there is no objection for the current plan, we would like to > move > > on > > > > for > > > > > the PR review: > > > > > https://github.com/apache/polaris/pull/1543/files > > > > > > > > > > Best Regards, > > > > > Yun > > > > > > > > > > > > > > > > > > > > On Thu, May 22, 2025 at 7:32 PM yun zou < > yunzou.colost...@gmail.com> > > > > > wrote: > > > > > > > > > > > > This is a stricter requirement than we have for Iceberg tables. > > Are > > > > we > > > > > > really going to enforce this? How will we do it efficiently? If > > not, > > > > > let's > > > > > > not put it in the spec. > > > > > > > > > > > > The efficiency is a good point, if we are supporting > > > > > > arbitrary nested namespaces, > > > > > > the efficiency is definitely a concern. Maybe we can restrict > that > > > for > > > > > > generic tables, > > > > > > but I think it would be good for us to stay consistent with > > Iceberg > > > > > > tables on this, > > > > > > since we share the namespace concept. > > > > > > We can exclude this from the spec. However, I do think that is > the > > > > right > > > > > > restriction > > > > > > to put for both Iceberg and generic tables for better security > > > > guarantee, > > > > > > maybe we > > > > > > can do a separate discussion on this topic. > > > > > > > > > > > > >It would be trivial to add update support for generic entities. > > Why > > > > > > canonicalize this restriction in the spec? We don't, for example, > > > > > currently > > > > > > detail a restriction around the fact that you can't change a > > generic > > > > > > table's format. > > > > > > > > > > > > Sure, we don't have to mention this in the Spec. > > > > > > > > > > > > > generic tables are a catch-all type not specific to any > > > > > > format (including Iceberg) > > > > > > > > > > > > Generic Table APis today have a clear separation with Iceberg > table > > > > APIs. > > > > > > I don't think we want to close > > > > > > the door for that, and that is also why I think "generic" is a > good > > > > name. > > > > > > However, if want to move on to > > > > > > include certain semantics for iceberg tables, for example, > showing > > > > > iceberg > > > > > > tables in list tables, there will be a repurpose of the API > > > endpoints, > > > > > and > > > > > > I think it would be more proper to > > > > > > move on for V2 spec, because people will have to use those > > > > > > APIs differently. > > > > > > > > > > > > > GenericTableEntity is the > > > > > > type I'm most likely to look to for the conversion service, which > > > means > > > > > it > > > > > > will indeed be used to represent Iceberg tables. > > > > > > > > > > > > For conversion, if we are converting a table to an iceberg table, > > and > > > > the > > > > > > table only > > > > > > has one root location, the target iceberg table will also have > one > > > root > > > > > > location, so I don't see > > > > > > a problem with this. If we are converting from an iceberg table > to > > a > > > > > > target format that only > > > > > > supports one location, I don't see a problem also. > > > > > > > > > > > > Even with Iceberg table spec today, I believe the locations it > has > > > are > > > > : > > > > > > top level location, > > > > > > metadata.path, and data.path. I don't think that can be achieved > > with > > > > an > > > > > > array of locations also, > > > > > > Because it can not tell which path is for metadata, which path is > > for > > > > > > data, I don't think relying on > > > > > > the size and position of an array is a good idea, and that extra > > path > > > > > > information can continue > > > > > > be represented with generic tables using properties and top level > > > > > location. > > > > > > Even with all those location configurations, I don't think > Iceberg > > > spec > > > > > is > > > > > > capturing all locations a table can have, > > > > > > because every snapshot can potentially write into a different > > > location, > > > > > > and those are not tracked anywhere by anyone today. > > > > > > Furthermore it might require information more than just a > location, > > > for > > > > > > example, it might need to be associated with the snapshot. > > > > > > I know Dennis was discussing a multi-location spec for Iceberg, > but > > > > > > the information needed seems more > > > > > > complicated than just a list of locations. > > > > > > Table with multiple location support seems a bigger topic that > > > requires > > > > > > much more thought to me, again I am not saying > > > > > > we shouldn't support it in the future, but I think we should put > > more > > > > > > thought into how tables with multiple locations > > > > > > work before we start supporting those. > > > > > > > > > > > > > The multi-location support in Polaris seems not very well also, > > the > > > > > > overlap check and credential vending seems all done with one > > location > > > > > > Sorry, i think i misread the caller of the code for the overlap > > > check. > > > > > > Dennis mentioned that we only use one location for credential, > > > > > > but it might be for something else. > > > > > > > > > > > > Best Regards, > > > > > > Yun > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard < > > > eric.w.mayn...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > >> > i meant no two tables under the same catalog can have the > same > > > > > location > > > > > >> > > > > > >> This is a stricter requirement than we have for Iceberg tables. > > Are > > > we > > > > > >> really going to enforce this? How will we do it efficiently? If > > not, > > > > > let's > > > > > >> not put it in the spec. > > > > > >> > > > > > >> > we do not have any update support > > > > > >> > > > > > >> It would be trivial to add update support for generic entities. > > Why > > > > > >> canonicalize this restriction in the spec? We don't, for > example, > > > > > >> currently > > > > > >> detail a restriction around the fact that you can't change a > > generic > > > > > >> table's format. > > > > > >> > > > > > >> > generic tables are designed for non-Iceberg tables today, > > > > > >> > > > > > >> I don't actually think this is true. There's nothing about > generic > > > > > tables > > > > > >> that make them more useful for Delta tables than Iceberg tables, > > for > > > > > >> example. On the contrary, I initially proposed the name > "generic" > > in > > > > > part > > > > > >> to capture that generic tables are a catch-all type not specific > > to > > > > any > > > > > >> format (including Iceberg). More practically, GenericTableEntity > > is > > > > the > > > > > >> type I'm most likely to look to for the conversion service, > which > > > > means > > > > > it > > > > > >> will indeed be used to represent Iceberg tables. > > > > > >> > > > > > >> > The multi-location support in Polaris seems not very well > also, > > > the > > > > > >> overlap check and credential vending seems all done with one > > > location > > > > > >> > > > > > >> This is not true. > > > > > >> > > > > > > > > > > > > > > > > > > > > >