What options do we have other than URI? I think it's more an engine side
concern.


If (as mentioned in previous emails) we use this location as an input into
generating vended credentials, then Polaris must be able to interpret it.

Therefore, it is not only an engine side concern.

What's the concern here?


Interpreting locations means dealing with S3 syntax peculiarities.
Effectively,
not all S3 locations comply with the URI RFC [1].

Polaris may be able to avoid parsing locations for credential vending, but
if
it is to do some "prefix" matching, I suspect it will have to deal with S3
location syntax issues.

This basically goes back to my first reply to this thread. I believe we need
to clarify the meaning and interpretation of the location property before
giving into more specific concerns.

[1] https://github.com/projectnessie/nessie/issues/8328

Cheers,
Dmitri.

On Wed, May 7, 2025 at 7:32 PM Yufei Gu <flyrain...@gmail.com> wrote:

> >
> > Another point: I'm pretty sure sooner or later users will want to move
> > their data to some other location. As an option users may want to write
> new
> > files into another location but keep old files in place.
>
> What's the concern here? This field is pretty much like the Iceberg table
> location, which points to all files under a generic table. It isn't related
> to how users relocate a table.
>
> Also: if the location is a URI, how do we deal with s3 vs. s3a for example?
>
>  What options do we have other than URI? I think it's more an engine side
> concern. I'm OK if Polaris opinionated a certain schema like "s3". We could
> even make the conversion at Polaris client side even if the engines
> require other schemas.
>
> Yufei
>
>
> On Wed, May 7, 2025 at 3:54 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> >
> >
> > Also: if the location is a URI, how do we deal with s3 vs. s3a for
> example?
> >
> > In Iceberg it is quite common for different engines to use different
> access
> > tools, which often leads to different URI schemes.
> >
> > Cheers,
> > Dmitri.
> >
> > On Wed, May 7, 2025 at 6:46 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > wrote:
> >
> > > All good questions Dmitri — I’m especially interested in the first one
> as
> > > from what I understand Iceberg tables can have metadata and data at two
> > > different paths that we need to vend credentials for.
> > >
> > > For iceberg tables, we just use special properties to track these
> > > locations. I wonder if we couldn’t do the same for generic tables.
> > >
> > > On Wed, May 7, 2025 at 3:42 PM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > > > Hi Yun,
> > > >
> > > > Please clarify the meaning of the value of the new location
> attribute.
> > > >
> > > > - Is is one value or many?
> > > > - Is it a URI?
> > > > - Does it point to any particular file?
> > > > - Is it a common prefix of all files within a table?
> > > > - What happens when a value does not match these expectation?
> > > >
> > > > Thanks,
> > > > Dmitri.
> > > >
> > > > On 2025/05/07 21:50:19 yun zou wrote:
> > > > > Hi folks,
> > > > >
> > > > > I would like to propose to add an optional `location` field to
> > > > > CreateGenricTable Request and LoadGenericTable response.
> > > > >
> > > > > The `location` is the location for the table, which is common to
> most
> > > > table
> > > > > formats including Iceberg, Delta, Hudi, csv, parquet etc. The
> > location
> > > > > information is critical for loading the table at engine side,
> having
> > a
> > > > > dedicated keyword could help improve the robustness for cross
> engine
> > > > > sharing, instead of relying on the properties passed by the client
> > > side.
> > > > >
> > > > > Furthermore, this information is also required to provide
> credential
> > > > > vending capabilities later.
> > > > >
> > > > > Here is the PR for adding the spec:
> > > > > https://github.com/apache/polaris/pull/1543
> > > > >
> > > > > Looking forward to your reply and feedback!
> > > > >
> > > > > Best Regards,
> > > > > Yun
> > > > >
> > > >
> > >
> >
>

Reply via email to