To be clear, this thread is about adding a field to the already-existing
type. It sounds like you’re advocating for adding even more fields?

On Tue, Jun 10, 2025 at 5:59 PM Laurent Goujon <laur...@dremio.com.invalid>
wrote:

> I appreciate the sizable amount of effort being put here but isn't the
> generic table proposal as is it today too generic? Without some description
> of the available types, without some metadata information (like the
> schema), without some protocol on how a client should actually interpret
> the content/interact with the table, aren't we fragmenting instead of
> unifying?
>
> Delta for example is working on some catalog protocol for instance:
> https://github.com/delta-io/delta/issues/4381
>
> Laurent
>
> On Tue, Jun 10, 2025 at 5:01 PM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > After this latest round of changes, it looks good to me too! Thanks for
> > working on this.
> >
> > On Tue, Jun 10, 2025 at 4:56 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > Sounds good to me.
> > > Yufei
> > >
> > >
> > > On Tue, Jun 10, 2025 at 3:53 PM yun zou <yunzou.colost...@gmail.com>
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > Thanks a lot for all the valuable feedback!
> > > >
> > > > I want to bump this thread up and see if we can conclude on the
> > direction
> > > > to move on.
> > > >
> > > > For the V1 generic table spec, we would like to start with support of
> > > > single location, and leave multiple location
> > > > support as an open discussion which could be introduced later.
> > > >
> > > > A new base-location field will be added to the generic table spec
> with
> > > the
> > > > following description:
> > > > - The base location is in URI format.
> > > > - The table base location is a location that includes all files for
> the
> > > > table.
> > > > - A table with multiple disjoint locations (i.e. containing files
> that
> > > are
> > > > outside the configured base location) is not compliant with the
> current
> > > > generic table support.
> > > > - If no location is provided, clients or users are responsible for
> > > > managing the location.
> > > >
> > > > We will also add a dedicated webpage for Polaris Generic Table to
> > > describe
> > > > all functionality and key fields clearly.
> > > >
> > > > If there is no objection for the current plan, we would like to move
> on
> > > for
> > > > the PR review:
> > > > https://github.com/apache/polaris/pull/1543/files
> > > >
> > > > Best Regards,
> > > > Yun
> > > >
> > > >
> > > >
> > > > On Thu, May 22, 2025 at 7:32 PM yun zou <yunzou.colost...@gmail.com>
> > > > wrote:
> > > >
> > > > > > This is a stricter requirement than we have for Iceberg tables.
> Are
> > > we
> > > > > really going to enforce this? How will we do it efficiently? If
> not,
> > > > let's
> > > > > not put it in the spec.
> > > > >
> > > > > The efficiency is a good point, if we are supporting
> > > > > arbitrary nested namespaces,
> > > > > the efficiency is definitely a concern. Maybe we can restrict that
> > for
> > > > > generic tables,
> > > > > but  I think it would be good for us to stay consistent with
> Iceberg
> > > > > tables on this,
> > > > > since we share the namespace concept.
> > > > > We can exclude this from the spec. However, I do think that is the
> > > right
> > > > > restriction
> > > > > to put for both Iceberg and generic tables for better security
> > > guarantee,
> > > > > maybe we
> > > > > can do a separate discussion on this topic.
> > > > >
> > > > > >It would be trivial to add update support for generic entities.
> Why
> > > > > canonicalize this restriction in the spec? We don't, for example,
> > > > currently
> > > > > detail a restriction around the fact that you can't change a
> generic
> > > > > table's format.
> > > > >
> > > > > Sure, we don't have to mention this in the Spec.
> > > > >
> > > > > > generic tables are a catch-all type not specific to any
> > > > > format (including Iceberg)
> > > > >
> > > > > Generic Table APis today have a clear separation with Iceberg table
> > > APIs.
> > > > > I don't think we want to close
> > > > > the door for that, and that is also why I think "generic" is a good
> > > name.
> > > > > However, if want to move on to
> > > > > include certain semantics for iceberg tables, for example, showing
> > > > iceberg
> > > > > tables in list tables, there will be a repurpose of the API
> > endpoints,
> > > > and
> > > > > I think it would be more proper to
> > > > > move on for V2 spec, because people will have to use those
> > > > > APIs differently.
> > > > >
> > > > > > GenericTableEntity is the
> > > > > type I'm most likely to look to for the conversion service, which
> > means
> > > > it
> > > > > will indeed be used to represent Iceberg tables.
> > > > >
> > > > > For conversion, if we are converting a table to an iceberg table,
> and
> > > the
> > > > > table only
> > > > > has one root location, the target iceberg table will also have one
> > root
> > > > > location, so I don't see
> > > > > a problem with this. If we are converting from an iceberg table to
> a
> > > > > target format that only
> > > > > supports one location, I don't see a problem also.
> > > > >
> > > > > Even with Iceberg table spec today, I believe the locations it has
> > are
> > > :
> > > > > top level location,
> > > > > metadata.path, and data.path. I don't think that can be achieved
> with
> > > an
> > > > > array of locations also,
> > > > > Because it can not tell which path is for metadata, which path is
> for
> > > > > data, I don't think relying on
> > > > > the size and position of an array is a good idea, and that extra
> path
> > > > > information can continue
> > > > > be represented with generic tables using properties and top level
> > > > location.
> > > > > Even with all those location configurations, I don't think Iceberg
> > spec
> > > > is
> > > > > capturing all locations a table can have,
> > > > > because every snapshot can potentially write into a different
> > location,
> > > > > and those are not tracked anywhere by anyone today.
> > > > > Furthermore it might require information more than just a location,
> > for
> > > > > example, it might need to be associated with the snapshot.
> > > > > I know Dennis was discussing a multi-location spec for Iceberg, but
> > > > > the information needed seems more
> > > > > complicated than just a list of locations.
> > > > > Table with multiple location support seems a bigger topic that
> > requires
> > > > > much more thought to me, again I am not saying
> > > > > we shouldn't support it in the future, but I think we should put
> more
> > > > > thought into how tables with multiple locations
> > > > > work before we start supporting those.
> > > > >
> > > > > > The multi-location support in Polaris seems not very well also,
> the
> > > > > overlap check and credential vending seems all done with one
> location
> > > > > Sorry, i think i misread the caller of the code for the overlap
> > check.
> > > > > Dennis mentioned that we only use one location for credential,
> > > > > but it might be for something else.
> > > > >
> > > > > Best Regards,
> > > > > Yun
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 22, 2025 at 3:08 PM Eric Maynard <
> > eric.w.mayn...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> >  i meant no two tables under the same catalog can have the same
> > > > location
> > > > >>
> > > > >> This is a stricter requirement than we have for Iceberg tables.
> Are
> > we
> > > > >> really going to enforce this? How will we do it efficiently? If
> not,
> > > > let's
> > > > >> not put it in the spec.
> > > > >>
> > > > >> > we do not have any update support
> > > > >>
> > > > >> It would be trivial to add update support for generic entities.
> Why
> > > > >> canonicalize this restriction in the spec? We don't, for example,
> > > > >> currently
> > > > >> detail a restriction around the fact that you can't change a
> generic
> > > > >> table's format.
> > > > >>
> > > > >> > generic tables are designed for non-Iceberg tables today,
> > > > >>
> > > > >> I don't actually think this is true. There's nothing about generic
> > > > tables
> > > > >> that make them more useful for Delta tables than Iceberg tables,
> for
> > > > >> example. On the contrary, I initially proposed the name "generic"
> in
> > > > part
> > > > >> to capture that generic tables are a catch-all type not specific
> to
> > > any
> > > > >> format (including Iceberg). More practically, GenericTableEntity
> is
> > > the
> > > > >> type I'm most likely to look to for the conversion service, which
> > > means
> > > > it
> > > > >> will indeed be used to represent Iceberg tables.
> > > > >>
> > > > >> > The multi-location support in Polaris seems not very well also,
> > the
> > > > >> overlap check and credential vending seems all done with one
> > location
> > > > >>
> > > > >> This is not true.
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to