To be clear, this thread is about adding a field to the already-existing type. It sounds like you’re advocating for adding even more fields?
On Tue, Jun 10, 2025 at 5:59 PM Laurent Goujon <laur...@dremio.com.invalid> wrote: > I appreciate the sizable amount of effort being put here but isn't the > generic table proposal as is it today too generic? Without some description > of the available types, without some metadata information (like the > schema), without some protocol on how a client should actually interpret > the content/interact with the table, aren't we fragmenting instead of > unifying? > > Delta for example is working on some catalog protocol for instance: > https://github.com/delta-io/delta/issues/4381 > > Laurent > > On Tue, Jun 10, 2025 at 5:01 PM Eric Maynard <eric.w.mayn...@gmail.com> > wrote: > > > After this latest round of changes, it looks good to me too! Thanks for > > working on this. > > > > On Tue, Jun 10, 2025 at 4:56 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > > > Sounds good to me. > > > Yufei > > > > > > > > > On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com> > > > wrote: > > > > > > > Hi Team, > > > > > > > > Thanks a lot for all the valuable feedback! > > > > > > > > I want to bump this thread up and see if we can conclude on the > > direction > > > > to move on. > > > > > > > > For the V1 generic table spec, we would like to start with support of > > > > single location, and leave multiple location > > > > support as an open discussion which could be introduced later. > > > > > > > > A new base-location field will be added to the generic table spec > with > > > the > > > > following description: > > > > - The base location is in URI format. > > > > - The table base location is a location that includes all files for > the > > > > table. > > > > - A table with multiple disjoint locations (i.e. containing files > that > > > are > > > > outside the configured base location) is not compliant with the > current > > > > generic table support. > > > > - If no location is provided, clients or users are responsible for > > > > managing the location. > > > > > > > > We will also add a dedicated webpage for Polaris Generic Table to > > > describe > > > > all functionality and key fields clearly. > > > > > > > > If there is no objection for the current plan, we would like to move > on > > > for > > > > the PR review: > > > > https://github.com/apache/polaris/pull/1543/files > > > > > > > > Best Regards, > > > > Yun > > > > > > > > > > > > > > > > On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com> > > > > wrote: > > > > > > > > > > This is a stricter requirement than we have for Iceberg tables. > Are > > > we > > > > > really going to enforce this? How will we do it efficiently? If > not, > > > > let's > > > > > not put it in the spec. > > > > > > > > > > The efficiency is a good point, if we are supporting > > > > > arbitrary nested namespaces, > > > > > the efficiency is definitely a concern. Maybe we can restrict that > > for > > > > > generic tables, > > > > > but I think it would be good for us to stay consistent with > Iceberg > > > > > tables on this, > > > > > since we share the namespace concept. > > > > > We can exclude this from the spec. However, I do think that is the > > > right > > > > > restriction > > > > > to put for both Iceberg and generic tables for better security > > > guarantee, > > > > > maybe we > > > > > can do a separate discussion on this topic. > > > > > > > > > > >It would be trivial to add update support for generic entities. > Why > > > > > canonicalize this restriction in the spec? We don't, for example, > > > > currently > > > > > detail a restriction around the fact that you can't change a > generic > > > > > table's format. > > > > > > > > > > Sure, we don't have to mention this in the Spec. > > > > > > > > > > > generic tables are a catch-all type not specific to any > > > > > format (including Iceberg) > > > > > > > > > > Generic Table APis today have a clear separation with Iceberg table > > > APIs. > > > > > I don't think we want to close > > > > > the door for that, and that is also why I think "generic" is a good > > > name. > > > > > However, if want to move on to > > > > > include certain semantics for iceberg tables, for example, showing > > > > iceberg > > > > > tables in list tables, there will be a repurpose of the API > > endpoints, > > > > and > > > > > I think it would be more proper to > > > > > move on for V2 spec, because people will have to use those > > > > > APIs differently. > > > > > > > > > > > GenericTableEntity is the > > > > > type I'm most likely to look to for the conversion service, which > > means > > > > it > > > > > will indeed be used to represent Iceberg tables. > > > > > > > > > > For conversion, if we are converting a table to an iceberg table, > and > > > the > > > > > table only > > > > > has one root location, the target iceberg table will also have one > > root > > > > > location, so I don't see > > > > > a problem with this. If we are converting from an iceberg table to > a > > > > > target format that only > > > > > supports one location, I don't see a problem also. > > > > > > > > > > Even with Iceberg table spec today, I believe the locations it has > > are > > > : > > > > > top level location, > > > > > metadata.path, and data.path. I don't think that can be achieved > with > > > an > > > > > array of locations also, > > > > > Because it can not tell which path is for metadata, which path is > for > > > > > data, I don't think relying on > > > > > the size and position of an array is a good idea, and that extra > path > > > > > information can continue > > > > > be represented with generic tables using properties and top level > > > > location. > > > > > Even with all those location configurations, I don't think Iceberg > > spec > > > > is > > > > > capturing all locations a table can have, > > > > > because every snapshot can potentially write into a different > > location, > > > > > and those are not tracked anywhere by anyone today. > > > > > Furthermore it might require information more than just a location, > > for > > > > > example, it might need to be associated with the snapshot. > > > > > I know Dennis was discussing a multi-location spec for Iceberg, but > > > > > the information needed seems more > > > > > complicated than just a list of locations. > > > > > Table with multiple location support seems a bigger topic that > > requires > > > > > much more thought to me, again I am not saying > > > > > we shouldn't support it in the future, but I think we should put > more > > > > > thought into how tables with multiple locations > > > > > work before we start supporting those. > > > > > > > > > > > The multi-location support in Polaris seems not very well also, > the > > > > > overlap check and credential vending seems all done with one > location > > > > > Sorry, i think i misread the caller of the code for the overlap > > check. > > > > > Dennis mentioned that we only use one location for credential, > > > > > but it might be for something else. > > > > > > > > > > Best Regards, > > > > > Yun > > > > > > > > > > > > > > > > > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard < > > eric.w.mayn...@gmail.com > > > > > > > > > wrote: > > > > > > > > > >> > i meant no two tables under the same catalog can have the same > > > > location > > > > >> > > > > >> This is a stricter requirement than we have for Iceberg tables. > Are > > we > > > > >> really going to enforce this? How will we do it efficiently? If > not, > > > > let's > > > > >> not put it in the spec. > > > > >> > > > > >> > we do not have any update support > > > > >> > > > > >> It would be trivial to add update support for generic entities. > Why > > > > >> canonicalize this restriction in the spec? We don't, for example, > > > > >> currently > > > > >> detail a restriction around the fact that you can't change a > generic > > > > >> table's format. > > > > >> > > > > >> > generic tables are designed for non-Iceberg tables today, > > > > >> > > > > >> I don't actually think this is true. There's nothing about generic > > > > tables > > > > >> that make them more useful for Delta tables than Iceberg tables, > for > > > > >> example. On the contrary, I initially proposed the name "generic" > in > > > > part > > > > >> to capture that generic tables are a catch-all type not specific > to > > > any > > > > >> format (including Iceberg). More practically, GenericTableEntity > is > > > the > > > > >> type I'm most likely to look to for the conversion service, which > > > means > > > > it > > > > >> will indeed be used to represent Iceberg tables. > > > > >> > > > > >> > The multi-location support in Polaris seems not very well also, > > the > > > > >> overlap check and credential vending seems all done with one > > location > > > > >> > > > > >> This is not true. > > > > >> > > > > > > > > > > > > > > >