Sounds good to me. Yufei
On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com> wrote: > Hi Team, > > Thanks a lot for all the valuable feedback! > > I want to bump this thread up and see if we can conclude on the direction > to move on. > > For the V1 generic table spec, we would like to start with support of > single location, and leave multiple location > support as an open discussion which could be introduced later. > > A new base-location field will be added to the generic table spec with the > following description: > - The base location is in URI format. > - The table base location is a location that includes all files for the > table. > - A table with multiple disjoint locations (i.e. containing files that are > outside the configured base location) is not compliant with the current > generic table support. > - If no location is provided, clients or users are responsible for > managing the location. > > We will also add a dedicated webpage for Polaris Generic Table to describe > all functionality and key fields clearly. > > If there is no objection for the current plan, we would like to move on for > the PR review: > https://github.com/apache/polaris/pull/1543/files > > Best Regards, > Yun > > > > On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com> > wrote: > > > > This is a stricter requirement than we have for Iceberg tables. Are we > > really going to enforce this? How will we do it efficiently? If not, > let's > > not put it in the spec. > > > > The efficiency is a good point, if we are supporting > > arbitrary nested namespaces, > > the efficiency is definitely a concern. Maybe we can restrict that for > > generic tables, > > but I think it would be good for us to stay consistent with Iceberg > > tables on this, > > since we share the namespace concept. > > We can exclude this from the spec. However, I do think that is the right > > restriction > > to put for both Iceberg and generic tables for better security guarantee, > > maybe we > > can do a separate discussion on this topic. > > > > >It would be trivial to add update support for generic entities. Why > > canonicalize this restriction in the spec? We don't, for example, > currently > > detail a restriction around the fact that you can't change a generic > > table's format. > > > > Sure, we don't have to mention this in the Spec. > > > > > generic tables are a catch-all type not specific to any > > format (including Iceberg) > > > > Generic Table APis today have a clear separation with Iceberg table APIs. > > I don't think we want to close > > the door for that, and that is also why I think "generic" is a good name. > > However, if want to move on to > > include certain semantics for iceberg tables, for example, showing > iceberg > > tables in list tables, there will be a repurpose of the API endpoints, > and > > I think it would be more proper to > > move on for V2 spec, because people will have to use those > > APIs differently. > > > > > GenericTableEntity is the > > type I'm most likely to look to for the conversion service, which means > it > > will indeed be used to represent Iceberg tables. > > > > For conversion, if we are converting a table to an iceberg table, and the > > table only > > has one root location, the target iceberg table will also have one root > > location, so I don't see > > a problem with this. If we are converting from an iceberg table to a > > target format that only > > supports one location, I don't see a problem also. > > > > Even with Iceberg table spec today, I believe the locations it has are : > > top level location, > > metadata.path, and data.path. I don't think that can be achieved with an > > array of locations also, > > Because it can not tell which path is for metadata, which path is for > > data, I don't think relying on > > the size and position of an array is a good idea, and that extra path > > information can continue > > be represented with generic tables using properties and top level > location. > > Even with all those location configurations, I don't think Iceberg spec > is > > capturing all locations a table can have, > > because every snapshot can potentially write into a different location, > > and those are not tracked anywhere by anyone today. > > Furthermore it might require information more than just a location, for > > example, it might need to be associated with the snapshot. > > I know Dennis was discussing a multi-location spec for Iceberg, but > > the information needed seems more > > complicated than just a list of locations. > > Table with multiple location support seems a bigger topic that requires > > much more thought to me, again I am not saying > > we shouldn't support it in the future, but I think we should put more > > thought into how tables with multiple locations > > work before we start supporting those. > > > > > The multi-location support in Polaris seems not very well also, the > > overlap check and credential vending seems all done with one location > > Sorry, i think i misread the caller of the code for the overlap check. > > Dennis mentioned that we only use one location for credential, > > but it might be for something else. > > > > Best Regards, > > Yun > > > > > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard <eric.w.mayn...@gmail.com> > > wrote: > > > >> > i meant no two tables under the same catalog can have the same > location > >> > >> This is a stricter requirement than we have for Iceberg tables. Are we > >> really going to enforce this? How will we do it efficiently? If not, > let's > >> not put it in the spec. > >> > >> > we do not have any update support > >> > >> It would be trivial to add update support for generic entities. Why > >> canonicalize this restriction in the spec? We don't, for example, > >> currently > >> detail a restriction around the fact that you can't change a generic > >> table's format. > >> > >> > generic tables are designed for non-Iceberg tables today, > >> > >> I don't actually think this is true. There's nothing about generic > tables > >> that make them more useful for Delta tables than Iceberg tables, for > >> example. On the contrary, I initially proposed the name "generic" in > part > >> to capture that generic tables are a catch-all type not specific to any > >> format (including Iceberg). More practically, GenericTableEntity is the > >> type I'm most likely to look to for the conversion service, which means > it > >> will indeed be used to represent Iceberg tables. > >> > >> > The multi-location support in Polaris seems not very well also, the > >> overlap check and credential vending seems all done with one location > >> > >> This is not true. > >> > > >