I appreciate the sizable amount of effort being put here but isn't the generic table proposal as is it today too generic? Without some description of the available types, without some metadata information (like the schema), without some protocol on how a client should actually interpret the content/interact with the table, aren't we fragmenting instead of unifying?
Delta for example is working on some catalog protocol for instance: https://github.com/delta-io/delta/issues/4381 Laurent On Tue, Jun 10, 2025 at 5:01 PM Eric Maynard <eric.w.mayn...@gmail.com> wrote: > After this latest round of changes, it looks good to me too! Thanks for > working on this. > > On Tue, Jun 10, 2025 at 4:56 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > Sounds good to me. > > Yufei > > > > > > On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > > Hi Team, > > > > > > Thanks a lot for all the valuable feedback! > > > > > > I want to bump this thread up and see if we can conclude on the > direction > > > to move on. > > > > > > For the V1 generic table spec, we would like to start with support of > > > single location, and leave multiple location > > > support as an open discussion which could be introduced later. > > > > > > A new base-location field will be added to the generic table spec with > > the > > > following description: > > > - The base location is in URI format. > > > - The table base location is a location that includes all files for the > > > table. > > > - A table with multiple disjoint locations (i.e. containing files that > > are > > > outside the configured base location) is not compliant with the current > > > generic table support. > > > - If no location is provided, clients or users are responsible for > > > managing the location. > > > > > > We will also add a dedicated webpage for Polaris Generic Table to > > describe > > > all functionality and key fields clearly. > > > > > > If there is no objection for the current plan, we would like to move on > > for > > > the PR review: > > > https://github.com/apache/polaris/pull/1543/files > > > > > > Best Regards, > > > Yun > > > > > > > > > > > > On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com> > > > wrote: > > > > > > > > This is a stricter requirement than we have for Iceberg tables. Are > > we > > > > really going to enforce this? How will we do it efficiently? If not, > > > let's > > > > not put it in the spec. > > > > > > > > The efficiency is a good point, if we are supporting > > > > arbitrary nested namespaces, > > > > the efficiency is definitely a concern. Maybe we can restrict that > for > > > > generic tables, > > > > but I think it would be good for us to stay consistent with Iceberg > > > > tables on this, > > > > since we share the namespace concept. > > > > We can exclude this from the spec. However, I do think that is the > > right > > > > restriction > > > > to put for both Iceberg and generic tables for better security > > guarantee, > > > > maybe we > > > > can do a separate discussion on this topic. > > > > > > > > >It would be trivial to add update support for generic entities. Why > > > > canonicalize this restriction in the spec? We don't, for example, > > > currently > > > > detail a restriction around the fact that you can't change a generic > > > > table's format. > > > > > > > > Sure, we don't have to mention this in the Spec. > > > > > > > > > generic tables are a catch-all type not specific to any > > > > format (including Iceberg) > > > > > > > > Generic Table APis today have a clear separation with Iceberg table > > APIs. > > > > I don't think we want to close > > > > the door for that, and that is also why I think "generic" is a good > > name. > > > > However, if want to move on to > > > > include certain semantics for iceberg tables, for example, showing > > > iceberg > > > > tables in list tables, there will be a repurpose of the API > endpoints, > > > and > > > > I think it would be more proper to > > > > move on for V2 spec, because people will have to use those > > > > APIs differently. > > > > > > > > > GenericTableEntity is the > > > > type I'm most likely to look to for the conversion service, which > means > > > it > > > > will indeed be used to represent Iceberg tables. > > > > > > > > For conversion, if we are converting a table to an iceberg table, and > > the > > > > table only > > > > has one root location, the target iceberg table will also have one > root > > > > location, so I don't see > > > > a problem with this. If we are converting from an iceberg table to a > > > > target format that only > > > > supports one location, I don't see a problem also. > > > > > > > > Even with Iceberg table spec today, I believe the locations it has > are > > : > > > > top level location, > > > > metadata.path, and data.path. I don't think that can be achieved with > > an > > > > array of locations also, > > > > Because it can not tell which path is for metadata, which path is for > > > > data, I don't think relying on > > > > the size and position of an array is a good idea, and that extra path > > > > information can continue > > > > be represented with generic tables using properties and top level > > > location. > > > > Even with all those location configurations, I don't think Iceberg > spec > > > is > > > > capturing all locations a table can have, > > > > because every snapshot can potentially write into a different > location, > > > > and those are not tracked anywhere by anyone today. > > > > Furthermore it might require information more than just a location, > for > > > > example, it might need to be associated with the snapshot. > > > > I know Dennis was discussing a multi-location spec for Iceberg, but > > > > the information needed seems more > > > > complicated than just a list of locations. > > > > Table with multiple location support seems a bigger topic that > requires > > > > much more thought to me, again I am not saying > > > > we shouldn't support it in the future, but I think we should put more > > > > thought into how tables with multiple locations > > > > work before we start supporting those. > > > > > > > > > The multi-location support in Polaris seems not very well also, the > > > > overlap check and credential vending seems all done with one location > > > > Sorry, i think i misread the caller of the code for the overlap > check. > > > > Dennis mentioned that we only use one location for credential, > > > > but it might be for something else. > > > > > > > > Best Regards, > > > > Yun > > > > > > > > > > > > > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard < > eric.w.mayn...@gmail.com > > > > > > > wrote: > > > > > > > >> > i meant no two tables under the same catalog can have the same > > > location > > > >> > > > >> This is a stricter requirement than we have for Iceberg tables. Are > we > > > >> really going to enforce this? How will we do it efficiently? If not, > > > let's > > > >> not put it in the spec. > > > >> > > > >> > we do not have any update support > > > >> > > > >> It would be trivial to add update support for generic entities. Why > > > >> canonicalize this restriction in the spec? We don't, for example, > > > >> currently > > > >> detail a restriction around the fact that you can't change a generic > > > >> table's format. > > > >> > > > >> > generic tables are designed for non-Iceberg tables today, > > > >> > > > >> I don't actually think this is true. There's nothing about generic > > > tables > > > >> that make them more useful for Delta tables than Iceberg tables, for > > > >> example. On the contrary, I initially proposed the name "generic" in > > > part > > > >> to capture that generic tables are a catch-all type not specific to > > any > > > >> format (including Iceberg). More practically, GenericTableEntity is > > the > > > >> type I'm most likely to look to for the conversion service, which > > means > > > it > > > >> will indeed be used to represent Iceberg tables. > > > >> > > > >> > The multi-location support in Polaris seems not very well also, > the > > > >> overlap check and credential vending seems all done with one > location > > > >> > > > >> This is not true. > > > >> > > > > > > > > > >