Sounds good to me.
Yufei

On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com> wrote:

> Hi Team,
>
> Thanks a lot for all the valuable feedback!
>
> I want to bump this thread up and see if we can conclude on the direction
> to move on.
>
> For the V1 generic table spec, we would like to start with support of
> single location, and leave multiple location
> support as an open discussion which could be introduced later.
>
> A new base-location field will be added to the generic table spec with the
> following description:
> - The base location is in URI format.
> - The table base location is a location that includes all files for the
> table.
> - A table with multiple disjoint locations (i.e. containing files that are
> outside the configured base location) is not compliant with the current
> generic table support.
> - If no location is provided, clients or users are responsible for
> managing the location.
>
> We will also add a dedicated webpage for Polaris Generic Table to describe
> all functionality and key fields clearly.
>
> If there is no objection for the current plan, we would like to move on for
> the PR review:
> https://github.com/apache/polaris/pull/1543/files
>
> Best Regards,
> Yun
>
>
>
> On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com>
> wrote:
>
> > > This is a stricter requirement than we have for Iceberg tables. Are we
> > really going to enforce this? How will we do it efficiently? If not,
> let's
> > not put it in the spec.
> >
> > The efficiency is a good point, if we are supporting
> > arbitrary nested namespaces,
> > the efficiency is definitely a concern. Maybe we can restrict that for
> > generic tables,
> > but  I think it would be good for us to stay consistent with Iceberg
> > tables on this,
> > since we share the namespace concept.
> > We can exclude this from the spec. However, I do think that is the right
> > restriction
> > to put for both Iceberg and generic tables for better security guarantee,
> > maybe we
> > can do a separate discussion on this topic.
> >
> > >It would be trivial to add update support for generic entities. Why
> > canonicalize this restriction in the spec? We don't, for example,
> currently
> > detail a restriction around the fact that you can't change a generic
> > table's format.
> >
> > Sure, we don't have to mention this in the Spec.
> >
> > > generic tables are a catch-all type not specific to any
> > format (including Iceberg)
> >
> > Generic Table APis today have a clear separation with Iceberg table APIs.
> > I don't think we want to close
> > the door for that, and that is also why I think "generic" is a good name.
> > However, if want to move on to
> > include certain semantics for iceberg tables, for example, showing
> iceberg
> > tables in list tables, there will be a repurpose of the API endpoints,
> and
> > I think it would be more proper to
> > move on for V2 spec, because people will have to use those
> > APIs differently.
> >
> > > GenericTableEntity is the
> > type I'm most likely to look to for the conversion service, which means
> it
> > will indeed be used to represent Iceberg tables.
> >
> > For conversion, if we are converting a table to an iceberg table, and the
> > table only
> > has one root location, the target iceberg table will also have one root
> > location, so I don't see
> > a problem with this. If we are converting from an iceberg table to a
> > target format that only
> > supports one location, I don't see a problem also.
> >
> > Even with Iceberg table spec today, I believe the locations it has are :
> > top level location,
> > metadata.path, and data.path. I don't think that can be achieved with an
> > array of locations also,
> > Because it can not tell which path is for metadata, which path is for
> > data, I don't think relying on
> > the size and position of an array is a good idea, and that extra path
> > information can continue
> > be represented with generic tables using properties and top level
> location.
> > Even with all those location configurations, I don't think Iceberg spec
> is
> > capturing all locations a table can have,
> > because every snapshot can potentially write into a different location,
> > and those are not tracked anywhere by anyone today.
> > Furthermore it might require information more than just a location, for
> > example, it might need to be associated with the snapshot.
> > I know Dennis was discussing a multi-location spec for Iceberg, but
> > the information needed seems more
> > complicated than just a list of locations.
> > Table with multiple location support seems a bigger topic that requires
> > much more thought to me, again I am not saying
> > we shouldn't support it in the future, but I think we should put more
> > thought into how tables with multiple locations
> > work before we start supporting those.
> >
> > > The multi-location support in Polaris seems not very well also, the
> > overlap check and credential vending seems all done with one location
> > Sorry, i think i misread the caller of the code for the overlap check.
> > Dennis mentioned that we only use one location for credential,
> > but it might be for something else.
> >
> > Best Regards,
> > Yun
> >
> >
> >
> > On Thu, May 22, 2025 at 3:08 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > wrote:
> >
> >> >  i meant no two tables under the same catalog can have the same
> location
> >>
> >> This is a stricter requirement than we have for Iceberg tables. Are we
> >> really going to enforce this? How will we do it efficiently? If not,
> let's
> >> not put it in the spec.
> >>
> >> > we do not have any update support
> >>
> >> It would be trivial to add update support for generic entities. Why
> >> canonicalize this restriction in the spec? We don't, for example,
> >> currently
> >> detail a restriction around the fact that you can't change a generic
> >> table's format.
> >>
> >> > generic tables are designed for non-Iceberg tables today,
> >>
> >> I don't actually think this is true. There's nothing about generic
> tables
> >> that make them more useful for Delta tables than Iceberg tables, for
> >> example. On the contrary, I initially proposed the name "generic" in
> part
> >> to capture that generic tables are a catch-all type not specific to any
> >> format (including Iceberg). More practically, GenericTableEntity is the
> >> type I'm most likely to look to for the conversion service, which means
> it
> >> will indeed be used to represent Iceberg tables.
> >>
> >> > The multi-location support in Polaris seems not very well also, the
> >> overlap check and credential vending seems all done with one location
> >>
> >> This is not true.
> >>
> >
>

Reply via email to