After this latest round of changes, it looks good to me too! Thanks for working on this.
On Tue, Jun 10, 2025 at 4:56 PM Yufei Gu <flyrain...@gmail.com> wrote: > Sounds good to me. > Yufei > > > On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com> > wrote: > > > Hi Team, > > > > Thanks a lot for all the valuable feedback! > > > > I want to bump this thread up and see if we can conclude on the direction > > to move on. > > > > For the V1 generic table spec, we would like to start with support of > > single location, and leave multiple location > > support as an open discussion which could be introduced later. > > > > A new base-location field will be added to the generic table spec with > the > > following description: > > - The base location is in URI format. > > - The table base location is a location that includes all files for the > > table. > > - A table with multiple disjoint locations (i.e. containing files that > are > > outside the configured base location) is not compliant with the current > > generic table support. > > - If no location is provided, clients or users are responsible for > > managing the location. > > > > We will also add a dedicated webpage for Polaris Generic Table to > describe > > all functionality and key fields clearly. > > > > If there is no objection for the current plan, we would like to move on > for > > the PR review: > > https://github.com/apache/polaris/pull/1543/files > > > > Best Regards, > > Yun > > > > > > > > On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > > > This is a stricter requirement than we have for Iceberg tables. Are > we > > > really going to enforce this? How will we do it efficiently? If not, > > let's > > > not put it in the spec. > > > > > > The efficiency is a good point, if we are supporting > > > arbitrary nested namespaces, > > > the efficiency is definitely a concern. Maybe we can restrict that for > > > generic tables, > > > but I think it would be good for us to stay consistent with Iceberg > > > tables on this, > > > since we share the namespace concept. > > > We can exclude this from the spec. However, I do think that is the > right > > > restriction > > > to put for both Iceberg and generic tables for better security > guarantee, > > > maybe we > > > can do a separate discussion on this topic. > > > > > > >It would be trivial to add update support for generic entities. Why > > > canonicalize this restriction in the spec? We don't, for example, > > currently > > > detail a restriction around the fact that you can't change a generic > > > table's format. > > > > > > Sure, we don't have to mention this in the Spec. > > > > > > > generic tables are a catch-all type not specific to any > > > format (including Iceberg) > > > > > > Generic Table APis today have a clear separation with Iceberg table > APIs. > > > I don't think we want to close > > > the door for that, and that is also why I think "generic" is a good > name. > > > However, if want to move on to > > > include certain semantics for iceberg tables, for example, showing > > iceberg > > > tables in list tables, there will be a repurpose of the API endpoints, > > and > > > I think it would be more proper to > > > move on for V2 spec, because people will have to use those > > > APIs differently. > > > > > > > GenericTableEntity is the > > > type I'm most likely to look to for the conversion service, which means > > it > > > will indeed be used to represent Iceberg tables. > > > > > > For conversion, if we are converting a table to an iceberg table, and > the > > > table only > > > has one root location, the target iceberg table will also have one root > > > location, so I don't see > > > a problem with this. If we are converting from an iceberg table to a > > > target format that only > > > supports one location, I don't see a problem also. > > > > > > Even with Iceberg table spec today, I believe the locations it has are > : > > > top level location, > > > metadata.path, and data.path. I don't think that can be achieved with > an > > > array of locations also, > > > Because it can not tell which path is for metadata, which path is for > > > data, I don't think relying on > > > the size and position of an array is a good idea, and that extra path > > > information can continue > > > be represented with generic tables using properties and top level > > location. > > > Even with all those location configurations, I don't think Iceberg spec > > is > > > capturing all locations a table can have, > > > because every snapshot can potentially write into a different location, > > > and those are not tracked anywhere by anyone today. > > > Furthermore it might require information more than just a location, for > > > example, it might need to be associated with the snapshot. > > > I know Dennis was discussing a multi-location spec for Iceberg, but > > > the information needed seems more > > > complicated than just a list of locations. > > > Table with multiple location support seems a bigger topic that requires > > > much more thought to me, again I am not saying > > > we shouldn't support it in the future, but I think we should put more > > > thought into how tables with multiple locations > > > work before we start supporting those. > > > > > > > The multi-location support in Polaris seems not very well also, the > > > overlap check and credential vending seems all done with one location > > > Sorry, i think i misread the caller of the code for the overlap check. > > > Dennis mentioned that we only use one location for credential, > > > but it might be for something else. > > > > > > Best Regards, > > > Yun > > > > > > > > > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard <eric.w.mayn...@gmail.com > > > > > wrote: > > > > > >> > i meant no two tables under the same catalog can have the same > > location > > >> > > >> This is a stricter requirement than we have for Iceberg tables. Are we > > >> really going to enforce this? How will we do it efficiently? If not, > > let's > > >> not put it in the spec. > > >> > > >> > we do not have any update support > > >> > > >> It would be trivial to add update support for generic entities. Why > > >> canonicalize this restriction in the spec? We don't, for example, > > >> currently > > >> detail a restriction around the fact that you can't change a generic > > >> table's format. > > >> > > >> > generic tables are designed for non-Iceberg tables today, > > >> > > >> I don't actually think this is true. There's nothing about generic > > tables > > >> that make them more useful for Delta tables than Iceberg tables, for > > >> example. On the contrary, I initially proposed the name "generic" in > > part > > >> to capture that generic tables are a catch-all type not specific to > any > > >> format (including Iceberg). More practically, GenericTableEntity is > the > > >> type I'm most likely to look to for the conversion service, which > means > > it > > >> will indeed be used to represent Iceberg tables. > > >> > > >> > The multi-location support in Polaris seems not very well also, the > > >> overlap check and credential vending seems all done with one location > > >> > > >> This is not true. > > >> > > > > > >