Hi Team, Thanks a lot for all the valuable feedback!
I want to bump this thread up and see if we can conclude on the direction to move on. For the V1 generic table spec, we would like to start with support of single location, and leave multiple location support as an open discussion which could be introduced later. A new base-location field will be added to the generic table spec with the following description: - The base location is in URI format. - The table base location is a location that includes all files for the table. - A table with multiple disjoint locations (i.e. containing files that are outside the configured base location) is not compliant with the current generic table support. - If no location is provided, clients or users are responsible for managing the location. We will also add a dedicated webpage for Polaris Generic Table to describe all functionality and key fields clearly. If there is no objection for the current plan, we would like to move on for the PR review: https://github.com/apache/polaris/pull/1543/files Best Regards, Yun On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com> wrote: > > This is a stricter requirement than we have for Iceberg tables. Are we > really going to enforce this? How will we do it efficiently? If not, let's > not put it in the spec. > > The efficiency is a good point, if we are supporting > arbitrary nested namespaces, > the efficiency is definitely a concern. Maybe we can restrict that for > generic tables, > but I think it would be good for us to stay consistent with Iceberg > tables on this, > since we share the namespace concept. > We can exclude this from the spec. However, I do think that is the right > restriction > to put for both Iceberg and generic tables for better security guarantee, > maybe we > can do a separate discussion on this topic. > > >It would be trivial to add update support for generic entities. Why > canonicalize this restriction in the spec? We don't, for example, currently > detail a restriction around the fact that you can't change a generic > table's format. > > Sure, we don't have to mention this in the Spec. > > > generic tables are a catch-all type not specific to any > format (including Iceberg) > > Generic Table APis today have a clear separation with Iceberg table APIs. > I don't think we want to close > the door for that, and that is also why I think "generic" is a good name. > However, if want to move on to > include certain semantics for iceberg tables, for example, showing iceberg > tables in list tables, there will be a repurpose of the API endpoints, and > I think it would be more proper to > move on for V2 spec, because people will have to use those > APIs differently. > > > GenericTableEntity is the > type I'm most likely to look to for the conversion service, which means it > will indeed be used to represent Iceberg tables. > > For conversion, if we are converting a table to an iceberg table, and the > table only > has one root location, the target iceberg table will also have one root > location, so I don't see > a problem with this. If we are converting from an iceberg table to a > target format that only > supports one location, I don't see a problem also. > > Even with Iceberg table spec today, I believe the locations it has are : > top level location, > metadata.path, and data.path. I don't think that can be achieved with an > array of locations also, > Because it can not tell which path is for metadata, which path is for > data, I don't think relying on > the size and position of an array is a good idea, and that extra path > information can continue > be represented with generic tables using properties and top level location. > Even with all those location configurations, I don't think Iceberg spec is > capturing all locations a table can have, > because every snapshot can potentially write into a different location, > and those are not tracked anywhere by anyone today. > Furthermore it might require information more than just a location, for > example, it might need to be associated with the snapshot. > I know Dennis was discussing a multi-location spec for Iceberg, but > the information needed seems more > complicated than just a list of locations. > Table with multiple location support seems a bigger topic that requires > much more thought to me, again I am not saying > we shouldn't support it in the future, but I think we should put more > thought into how tables with multiple locations > work before we start supporting those. > > > The multi-location support in Polaris seems not very well also, the > overlap check and credential vending seems all done with one location > Sorry, i think i misread the caller of the code for the overlap check. > Dennis mentioned that we only use one location for credential, > but it might be for something else. > > Best Regards, > Yun > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard <eric.w.mayn...@gmail.com> > wrote: > >> > i meant no two tables under the same catalog can have the same location >> >> This is a stricter requirement than we have for Iceberg tables. Are we >> really going to enforce this? How will we do it efficiently? If not, let's >> not put it in the spec. >> >> > we do not have any update support >> >> It would be trivial to add update support for generic entities. Why >> canonicalize this restriction in the spec? We don't, for example, >> currently >> detail a restriction around the fact that you can't change a generic >> table's format. >> >> > generic tables are designed for non-Iceberg tables today, >> >> I don't actually think this is true. There's nothing about generic tables >> that make them more useful for Delta tables than Iceberg tables, for >> example. On the contrary, I initially proposed the name "generic" in part >> to capture that generic tables are a catch-all type not specific to any >> format (including Iceberg). More practically, GenericTableEntity is the >> type I'm most likely to look to for the conversion service, which means it >> will indeed be used to represent Iceberg tables. >> >> > The multi-location support in Polaris seems not very well also, the >> overlap check and credential vending seems all done with one location >> >> This is not true. >> >