Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

Gabor Kaszab Thu, 12 Dec 2024 05:53:27 -0800

Hi Iceberg Community,

It took me a while but I finally managed to upload the proposal for this as
an official 'Iceberg improvement proposal'. Thanks for the feedback so far!


https://github.com/apache/iceberg/issues/11766

Regards,
Gabor


On Fri, Nov 22, 2024 at 4:51 PM Taeyun Kim <[email protected]>
wrote:

> Hi,
>
> Since ETags are opaque values to the client, attributing any semantic
> meaning to them in the interaction between the client and server would, in
> my opinion, constitute a misuse/abuse of the HTTP specification.
> On the other hand, the server can generate the ETag value as any string,
> as long as it conforms to the grammar defined in
> https://httpwg.org/specs/rfc9110.html#field.etag . Using the metadata
> location is likely the simplest option. For reference, based on the
> grammar, ETag values cannot include spaces. Therefore, if the metadata
> location contains spaces, it may need to be encoded. The same goes for
> double quotation marks. (I just found this out after looking it up.)
> Anyway, in my opinion, the client must ignore any semantic meaning
> associated with the value.
>
> Thank you.
>
> -----Original Message-----
> From:  "Zoltán Borók-Nagy" <[email protected]>
> To:      <[email protected]>;
> Cc:
> Sent:  2024-11-22 (금) 19:57:08 (UTC+09:00)
> Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is the latest
>
> Hi,
>
> Separate version information forces the clients to manage a Table ->
> VersionIdentifier mapping which adds unnecessary complexity and can be
> error-prone.
>
> If the VersionIdentifier is embedded in the Table object then the
> application logic is much simpler, and the Catalog interface is not
> only simpler, but also hard to use incorrectly.
> Though this approach slightly increases the size of the Table objects.
> And touching the Table interface might encounter some resistance, even
> if it is only an extension.
>
> Yeah, VersionIdentifier doesn't need to be a String, it could be an
> Object, or an empty interface, and the Catalog implementation could
> cast it to some catalog-specific VersionIdentifierImpl.
>
> loadTableIfChanged() throwing UnsupportedOperationException is
> reasonable, as clients can easily fallback to loadTable. In my mind I
> had a use case where we cache tables without any refresh checks for a
> configured TTL, and after expiration we invoke reloadTable() anyway.
> But this use case can also be implemented even if loadTableIfChanged()
> throws exceptions, making this approach more flexible.
>
> About metadata_location as ETag: I don't have a strong opinion here,
> not sure what could go wrong if we do this. If we start with this
> approach we don't even need a VersionIdentifier for Tables, making the
> whole proposal more lightweight.
>
> Thanks Gabor for driving this and putting together a proposal!
>
> Cheers,
>     Zoltan
>
> On Fri, Nov 22, 2024 at 11:42 AM Gabor Kaszab <[email protected]>
> wrote:
> >
> > Hi Taeyun,
> >
> > Thanks for the writeup! Let me reflect to some areas:
> >
> >> the caller manages the version identifier separately.
> >
> > Since the callers of this interface would be the query engines
> themselves most of the cases, this would mean that Impala, Spark, Hive,
> Trino, etc. would need to implement their way of storing and updating
> VersionIdentifiers. This would push unwanted complexity to the client side.
> I'm against this. I'd be ok to include this VersionIdentifier (or
> CatalogVersion, or doesn't matter how we called this) into the Table object
> and sort of hide it from the clients if the community allows us to do so.
> >
> >> caller must rely on exception handling
> >
> > I don't think this approach is new. Even the current loadTable() API
> throws an exception if the table doesn't exist for instance. We can
> similarly throw an exception if the freshness-aware table loading is not
> feasible for some reason.
> >
> >> I'm against ... where the function always loads and returns a new Table
> when freshness checks aren’t possible
> >
> > We are in agreement here, I'm also against this. The user expects better
> performance when calling this interface compared to keeping calling the
> regular loadTable(). So if freshness checks aren't possible, let's say
> because that catalog implementation doesn't support it, the user should get
> an exception. I don't think that there is a need to call
> canCheckFreshness() before. That's just extra noise in the interface.
> >
> >> The server is free to assign any value to the ETag. This means the
> client should not attempt to interpret the content of the ETag.
> >
> > With the REST spec we can have a control of what the implementations
> should put into this ETag header. If we articulate in the spec that the
> ETag should be the metadata location then the implementations have to
> follow this contract, hence the clients could give semantics to the ETag
> and use it for the metadata location.
> >
> > I wonder if you have any proposal for the content of the ETag apart from
> it can be any value. Another approach that comes to my mind is to create a
> LoadTableResponse object on the server side, hash it, and then the hash of
> the response body could be used as an ETag. However, for that the server
> has to construct the LoadTableResponse unconditionally, even though in some
> cases this won't be sent back to the client. With this the server has to
> read and parse the full table metadata in order to judge if the table has
> changed or not. This means that there is going to be the same load on the
> server as if the user were calling the regular loadTable() interface and
> just the amount of data sent back on the network would be less.
> > Hence I propose that even though in theory the REST servers could use
> anything as an ETag, the metadata location seems a pretty convenient
> content for that header, and then we don't have to do a full table metadata
> read on the server side, just get the metadata location that is most
> probably cached anyway in memory.
> >
> > Next steps:
> > I think our proposals are coming close to each other, even though we
> have some disagreements on some details. I feel that we have really low
> activity on this thread from people with decision making privileges so
> let's do this:
> > Let me put together an improvement proposal as advised by Fokko, and
> let's hope it will attract people having binding votes so that we can come
> to a conclusion on all details. What do you think?
> >
> > Regards,
> > Gabor
> >
> > On Fri, Nov 22, 2024 at 3:06 AM Taeyun Kim <[email protected]>
> wrote:
> >>
> >> Hi,
> >>
> >> - On the Function:
> >>
> >> The function signature I propose is as follows (slightly modified from
> my previous suggestion):
> >>
> >> Option(Table, Option(VersionIdentifier))
> loadTableIfChanged(TableIdentifier, Option(VersionIdentifier))
> >>
> >> The key difference from Gabor’s proposal is that the caller manages the
> version identifier separately. For example, in the case of a REST catalog,
> the VersionIdentifier could be an ETag, while in other catalogs, it could
> be a metadata location. The important point is that the caller doesn’t
> interpret the identifier. In Java, the type for the version identifier
> could even be Object, allowing it to represent the Table itself if
> necessary.
> >> With this signature, the callee can return None as the
> VersionIdentifier to signal that freshness checks are not possible for the
> table, effectively informing the caller that a reload is unavoidable. (In
> Java, this could be implemented with Optional or simply using a nullable
> value as before.)
> >> In Gabor’s proposal, the caller must rely on exception handling to
> determine if freshness checks are possible. However, exceptions are
> typically used for unexpected issues, so using them for this purpose might
> feel slightly awkward.
> >> That said, as long as there’s a way for the caller to know whether
> freshness checks are possible, the exact function signature might not be
> critically important.
> >> Another key point is that the caller provides the basis for the
> freshness check when invoking the function. Different callers might hold
> distinct versions, so it’s important for the caller to supply the version
> information. In my proposal, the caller provides this explicitly via the
> VersionIdentifier. In Gabor’s proposal, version information would likely
> need to be embedded within the Table object.
> >> For Gabor’s approach, adding a non-static method to the Table class
> like the following could allow the caller to pre-check freshness
> capabilities:
> >>
> >> bool Table.canCheckFreshness()
> >>
> >> On the other hand, I’m against an implementation of
> loadTableIfChanged() where the function always loads and returns a new
> Table when freshness checks aren’t possible. This approach prevents the
> caller from handling caching behavior separately based on the availability
> of freshness checks. This is similar to Gabor’s concern about losing
> control of caching to the 4) and 5) layers.
> >> Therefore, if freshness checks are not possible, I agree with Gabor
> that the function (at least in Java) should throw an
> UnsupportedOperationException.
> >> While freshness checks are relatively straightforward to implement
> (e.g., by comparing metadata locations), and all catalogs may eventually
> support this feature soon after the API is introduced, it’s not a
> guarantee. For example, a RESTClient library class that supports freshness
> checks might be used to connect to an older REST catalog server that
> doesn’t support the new freshness checking specification. The client may
> not have the authority to upgrade the server. BTW, the RESTClient can
> determine that the server doesn’t support freshness checks based on the
> absence of these HTTP caching headers.
> >>
> >> - On ETag Content:
> >>
> >> The server is free to assign any value to the ETag. This means the
> client should not attempt to interpret the content of the ETag.
> >> As I mentioned before, if the REST catalog API uses ETags, it’s
> essential that no semantic meaning is attributed to their values between
> client and server.
> >>
> >> Thank you.
> >>
> >>
> >> -----Original Message-----
> >> From: "Gabor Kaszab" <[email protected]>
> >> To: <[email protected]>;
> >> Cc:
> >> Sent: 2024-11-21 (목) 21:06:48 (UTC+09:00)
> >> Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is the
> latest
> >>
> >>
> >> Hey,
> >>
> >> I think there is one open question here where we disagree: It's the
> proposed function on the Catalog API (not the REST spec). I don't think we
> can ever include a parameter like ETag at this level of abstraction. The
> Catalog API is common for all the catalog implementations and is not just
> for REST or for catalogs that use HTTP. Since ETag is an HTTP specific
> detail, hence I said it's an implementation detail and we can't include it
> to the Catalog level API. It is relevant for the proposed REST spec changes
> and the HTTPClient implementation within the REST client, but it is most
> probably not relevant for other catalog types like HiveCatalog,
> HadoopCatalog, etc.
> >>
> >>
> >> In terms of the Catalog level API (levels 2) and 3) in my previous
> mail) I think this new API should be used only for freshness aware table
> loading and we shouldn't fall back to regular table loading if this is not
> implemented by a catalog. I find the other naming more verbose for this
> purpose:
> >>
> >>
> >> Table loadTableIfChanged(TableIdentifier, Table);
> >>
> >>
> >> Here, the Table parameter could be null if the client loads it the
> first time, hence the need for a TableIdentifier parameter. If a catalog
> doesn't have a freshness aware table loading mechanism, it would throw an
> UnsupportedOperationException. This would avoid confusion whether each call
> on this API is freshness aware or not.
> >>
> >>
> >> Note, there is no ETag included here explicitly in the function
> signature.
> >> - Initially this is how I thought this should work:
> >> The level of abstraction takes care of ETags where they are relevant,
> so in our case it's the REST client (most probably level 3) in my prev
> mail). This level could have a mapping between TableIdentifiers and latest
> known ETags. So when a freshness aware table loading request comes to the
> REST client it looks up the ETag using the TableIdentifier and uses it for
> the HTTP header.
> >> After talking with Zoltan we found some risks with this approach in
> concurrent scenarios, where the first thread gets the latest table, sets
> the latest ETag within this mapping, but then other threads still holding
> an old version of the table could get stuck with this old version since the
> REST client's ETag is at the latest so no actual reload would be performed.
> >>
> >> - Alternatives:
> >> 1) The Table class, or in fact the inherited classes should have a
> field that stores the 'catalogVersion' as suggested by Zoltan. For REST
> catalogs this can store the ETag, for other catalogs some other information
> for the same purpose.
> >> 2) I checked this description of ETags, and even though we discussed
> earlier that this is some server generated information, for me it seems
> that it can be basically anything:
> >> "There are no restrictions on how the server must generate the value,
> so servers are free to set the value based on whatever means they choose —
> such as a hash of the body contents or a version number." So basically an
> ETag can also be a metadata location String as suggested by Yufei (if I'm
> not mistaken). We can also go with this approach and then there is no need
> for a new field within the Table class.
> >>
> >>
> >> Looking forward to hearing your thoughts!
> >> Gabor
> >>
> >>
> >> On Thu, Nov 21, 2024 at 10:03 AM Zoltán Borók-Nagy <
> [email protected]> wrote:
> >>
> >> Sorry, one more thing about the methods:
> >>
> >>   Table reloadTable(Table); // or,
> >>   Table reloadTable(TableIdentifier, Table) // where Table could be NULL
> >>
> >> I want to highlight that it is super easy to provide a default
> >> implementation which just loads the table. Then later, catalog
> >> implementations can just add their clever tricks to make it more
> >> efficient.
> >>
> >> Cheers,
> >>     Zoltan
> >>
> >> On Thu, Nov 21, 2024 at 9:53 AM Zoltán Borók-Nagy <
> [email protected]> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I agree with Gabor that the support of efficiently reloading Iceberg
> >> > tables is a generic problem that applies to all catalog
> >> > implementations.
> >> > I also think that the programming API, especially the Iceberg Java
> >> > library is very important, as almost all Iceberg clients use this
> >> > library to interact with Iceberg tables, no matter which catalog they
> >> > reside in.
> >> > Even the engines that are mostly written in C++ (Impala, Starrocks,
> >> > Doris) interact with Iceberg tables from their Java frontends using
> >> > the Iceberg Java library.
> >> >
> >> > "Since libraries are open-source, I can modify them as needed for my
> >> > use case" - if you want to maintain a private fork, then sure,
> >> > otherwise you really want to avoid introducing breaking changes. Also,
> >> > you want to introduce new features in a way that is acceptable for the
> >> > community. In that sense, modifying a library's interface is not much
> >> > easier than modifying a server's interface. Of course, clients of a
> >> > library have control over when to upgrade, a privilege you don't
> >> > always have for server APIs, but this is why API versioning was
> >> > invented, anyway, we are diverging from the main topic here.
> >> >
> >> > Since this reloadTable() method could be useful for other Catalog
> >> > implementations as well, I think we would like to add a new method to
> >> > org.apache.iceberg.catalog.Catalog that doesn't take any
> >> > implementation-specific detail about the underlying catalog. To
> >> > overcome this, catalogs could embed catalog-specific information into
> >> > the Table object when they initially load the table, e.g. "String
> >> > catalogVersion". In the case of the REST Catalog the catalogVersion
> >> > string would be the ETag. Other catalogs might not even need to add
> >> > anything, as the metadata_location of the Table object is sufficient.
> >> >
> >> > This way the API would be simple and generic:
> >> >
> >> >   Table reloadTable(Table); // or,
> >> >   Table reloadTable(TableIdentifier, Table) // where Table could be
> NULL
> >> >
> >> > Cheers,
> >> >     Zoltan
> >> >
> >> > On Thu, Nov 21, 2024 at 1:51 AM Taeyun Kim <
> [email protected]> wrote:
> >> > >
> >> > > Hi Gabor,
> >> > >
> >> > > On HTTP Caching:
> >> > >
> >> > > If an HTTP client library performs caching by default and doesn’t
> allow disabling it, I believe that library shouldn’t be used - at least in
> the context of this discussion.
> >> > > The kind of HTTP client library I have in mind is one that handles
> encoding and decoding of HTTP headers and body, as well as connection
> pooling. The responsibility for interpreting headers, status, and body
> content should remain with the application. While caching support can be
> provided, it should be optional.
> >> > > When using a library that behaves as I described, the issues you
> mentioned in points 4) and 5) shouldn’t arise, as the library wouldn’t
> interfere with caching.
> >> > > For reference, the Rust reqwest crate (which Iceberg-Rust appears
> to use) seems to operate as expected in this regard.
> >> > >
> >> > > On Programming Languages and APIs:
> >> > >
> >> > > One of my points was that there doesn’t seem to be a reason to
> center the discussion around Java (and its libraries).
> >> > > BTW, I don’t think it’s necessary for the functions in the
> iceberg-rust library to be identical to those in the Java library. Optimal
> solutions may vary by language, and library developers may have different
> design goals.
> >> > > Personally, my primary focus is on the REST catalog API
> specification, rather than language-specific library APIs. (To avoid
> confusion, I’ll refer to the REST catalog API as the "specification" from
> here on.)
> >> > > Library APIs are (merely) implementations designed to make the
> specification easier to use. Since libraries are open-source, I can modify
> them as needed for my use case (and, in fact, I’ve made modifications to
> iceberg-rust for my purposes). However, the specification defines the
> interface between different applications or servers, making it immutable
> for practical purposes.
> >> > >
> >> > > On ETags:
> >> > >
> >> > > The decision to use ETags is not just an implementation detail - it
> is part of the specification itself. In my view, it is far more significant
> than the signature of a library API function. I’ve outlined the reasons for
> this above.
> >> > >
> >> > > On the Proposal:
> >> > >
> >> > > I agree that the current function (loadTable(TableIdentifier))
> cannot be freshness-aware. This is expected, as the caller doesn’t provide
> the version it holds, leaving the callee with no basis for comparison.
> >> > > On the other hand, the proposed new function signature doesn’t seem
> to provide a way for the caller to supply ETags (or equivalent identifiers
> representing specific table versions for other catalog types). Is such
> information intended to be embedded within the Table structure?
> >> > > To me, it seems clearer to explicitly provide such information
> (like ETags) rather than embedding it in the Table structure. That said, I
> might be misunderstanding the intention here.
> >> > >
> >> > > Thank you.
> >> > >
> >> > >
> >> > > -----Original Message-----
> >> > > From: "Gabor Kaszab" <[email protected]>
> >> > > To: <[email protected]>;
> >> > > Cc:
> >> > > Sent: 2024-11-19 (화) 21:26:01 (UTC+09:00)
> >> > > Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is
> the latest
> >> > >
> >> > >
> >> > > Hi,
> >> > >
> >> > >
> >> > > Thanks for sharing your view, Taeyun! I think there are many levels
> of representation here and we might not mean the same with our points. I
> think in general an interaction between a query engine and an Iceberg REST
> catalog has these different layers:
> >> > > 1) The engine (Impala, Spark, Trino, etc.).
> >> > > 2) Catalog API of the Iceberg lib offers loadTable(TableIdentifier)
> that returns a Table object. Different language implementations seem to
> have the same API (Java, Rust, etc.).
> >> > > 3) The particular implementation of a catalog that implements the
> above loadTable(TableIdentifier) function. In this example the RESTCatalog
> / RESTSessionCatalog.
> >> > > 4) RESTClient implemented by HTTPClient (used by the REST catalog)
> to communicate with the REST server (still implemented within Iceberg)
> >> > > 5) The external HTTPClient
> (org.apache.hc.client5.http.impl.classic.CloseableHttpClient) that
> orchestrates the HTTP traffic between the client and the server
> >> > >
> >> > >
> >> > > Let me reflect on your comments:
> >> > > - HTTP caching
> >> > > With the above layers in mind if I'm not mistaken HTTP Caching
> would be configured in 4) and the actual caching of HTTP responses would be
> in 5). This is what I meant by HTTP layer. With HTTP Caching the control of
> how long a cached TableMetadata is stored will no longer be in 1) and would
> be in 4) - 5). I don't think that any of the engines that cache table
> metadata would want to have this loss of control.
> >> > >
> >> > >
> >> > > - Programming language
> >> > > I'm not sure I get your point with this. The Catalog API seems the
> same regardless of programming language (See the links for 2) ).
> >> > >
> >> > >
> >> > > - ETags
> >> > > An ETag is an implementation detail that is relevant for HTTP
> communication. We can't extend the Catalog API in 2) nor in 3) with
> functions that are aware of ETags (e.g. return ETags or accept ETags as
> param). Those APIs are common for all the Catalog implementations including
> ones that don't leverage ETags for HTTP traffic.
> >> > >
> >> > >
> >> > > Proposal:
> >> > > - Catalog API
> >> > > I don't think that the current Catalog.loadTable(TableIdentifier)
> API in 2) is suitable for a freshness-aware table loading use case. It
> wouldn't be transparent to the clients if that actual catalog
> implementation avoids reloading the table if it hasn't changed or if that
> catalog implementation reloads the table unconditionally with this API call.
> >> > > Also it doesn't seem straightforward what the API should return if
> the table is considered fresh. This API returns a Table object and in case
> we get a 304 without a body from the catalog server, we won't have a way to
> construct a Table object as a return value for this API. I already shared
> my concerns for caching the LoadTableResponses within 4) - 5)
> >> > >
> >> > >
> >> > > So I think we need a new API on the Catalog for this purpose.
> Thanks Zoltan for the naming suggestion, after I sent my mail yesterday I
> also thought that I could've come up with a more intuitive name.
> >> > > This can either be:
> >> > > a) Table reloadTable(TableIdentifier, Table)
> >> > > b) Table loadTableIfChanged(TableIdentifier, Table)
> >> > >
> >> > >
> >> > > With this Catalog level API we could provide the current known
> state of that particular table as a parameter, and if the client side of
> the catalog implementation finds that the table hasn't changed it can
> return this Table object for the current state. With this approach no
> caching would be needed in 2) - 5). It's up to the catalog implementation
> how it finds out if the table has been changed or not.
> >> > >
> >> > >
> >> > > - REST API, REST spec
> >> > > The REST API could use the ETag approach to check table freshness.
> As described in previous mails this could reduce the number of round trips
> to refresh a table to one without the need of separately checking the
> freshness. We could use the same endpoint as we do for the current
> loadTable(), with an additional mention of an optional ETag being used and
> also including the 304 into the possible responses.
> >> > >
> >> > >
> >> > > For this approach we have to cache the [TableIdentifier -> last
> ETag] mapping somewhere. I think 4), the Iceberg specific HTTPClient could
> be suitable for this purpose, however, this seems too low level for this
> purpose. We can also consider RESTSessionCatalog to cache the ETags.
> >> > >
> >> > >
> >> > > This is something to be considered, but for REST catalog
> implementations that don't support the ETag based implementation, they
> would just simply perform a regular loadTable operation, not bothering with
> sending 304 codes. We can also investigate if we should get an exception if
> that particular REST implementation doesn't support the ETag approach, so
> that clients could notice if there is no freshness-aware table loading
> under the hood.
> >> > >
> >> > >
> >> > > - Other catalog types
> >> > > Currently we focus on the REST catalog implementation but the above
> Catalog API proposal could work for other catalog types too. The internal
> implementation could be different, though. Initially they could throw a
> NotImplementedException.
> >> > >
> >> > >
> >> > > I hope this makes sense and I haven't missed any details or
> previous comments.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Gabor
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Nov 19, 2024 at 5:17 AM Taeyun Kim <
> [email protected]> wrote:
> >> > >
> >> > > Hi,
> >> > >
> >> > > Here are my thoughts:
> >> > >
> >> > > - HTTP Layer: To my knowledge, there isn’t a separate "HTTP layer"
> in this context, so concerns about control over caching shouldn’t be an
> issue. The header approach I mentioned simply involves handling additional
> headers when using HTTP client libraries to interact with the REST API.
> >> > >
> >> > > - Programming Language: For reference, I don’t use Java - I use
> Rust and C++. Personally, I hope Iceberg’s specifications avoid including
> Java-specific features and that the cross-language compatible REST catalog
> becomes the primary catalog for Iceberg.
> >> > >
> >> > > - API Perspective: Given the above, I may not be in the best
> position to comment on Java APIs. However, regarding Gabor’s proposed API
> (Table loadTable(Table existingTable)), I believe it would be difficult for
> an ETag-based REST catalog to support this API since it cannot provide an
> ETag. Instead, I’d like to suggest an alternative API:
> >> > > Option<Table, tag> loadTableIfNoneMatch(TableIdentifier,
> Option<tag>)
> >> > > Initially, the client would provide None as the tag. If the tag is
> not None and matches the latest table tag, the API returns None (= not
> updated). If the tag is None or does not match the latest table tag, the
> API returns a new (Table, tag) pair.
> >> > >
> >> > > Thank you.
> >> > >
> >> > >
> >> > > -----Original Message-----
> >> > > From: "Zoltán Borók-Nagy" <[email protected]>
> >> > > To: <[email protected]>;
> >> > > Cc:
> >> > > Sent: 2024-11-19 (화) 03:16:05 (UTC+09:00)
> >> > > Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is
> the latest
> >> > >
> >> > >
> >> > > Hey Everyone,
> >> > >
> >> > >
> >> > > Thanks Gábor, I think the proposed interface would be very useful
> to any engine that employs caching, e.g. Impala.
> >> > > And it is pretty neat that it is catalog-agnostic, i.e. we just
> give all the information we have about the table and let the catalog
> implementation efficiently reload it.
> >> > >
> >> > >
> >> > > I might have a nitpick suggestion about the name to clearly express
> the intent: loadTable -> reloadTable (or, refreshTable)
> >> > >
> >> > >
> >> > > Cheers,
> >> > >     Zoltan
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Nov 18, 2024 at 5:17 PM Gabor Kaszab <
> [email protected]> wrote:
> >> > >
> >> > > Hi Iceberg Community,
> >> > >
> >> > >
> >> > > This is a great conversation so far, and thanks everyone for the
> valuable inputs!
> >> > > I'd like to articulate 2 things that we have to keep in mind with
> the design:
> >> > >
> >> > >
> >> > > 1: There are 2 interfaces here that we should consider:
> >> > > What I mean by this is that so far we have been talking about the
> REST spec, more narrowly the HTTP communication between Iceberg's REST
> client and the REST server. I think the proposed solution with the ETag
> absolutely makes sense within this context.
> >> > > However, the usual way of a client interacting with an Iceberg
> catalog (including REST) is the Catalog API in the library. This API offers
> a loadTable(TableIdentifier) function that returns a Table object. With the
> above HTTP-based solution in mind I don't think we could give any
> meaningful results if the HTTP layer finds that the table hasn't changed. I
> argued already against pushing the caching responsibilities from the
> clients into the HTTP layer (mostly because of losing the control over the
> cache, and also observability won't be straightforward) so let's assume for
> now that we won't do caching in the HTTP layer, only execute the loadTable
> calls to the REST catalog by setting the ETag. In case we get a 304 we
> won't be able to construct a Table object to answer the
> Catalog.loadTable(TableIdentifier) call. We could return null or throw an
> exception but I don't find any of them appropriate.
> >> > >
> >> > >
> >> > > 2: There are catalog types other than REST
> >> > > I started this conversation focusing on the REST spec, but the more
> I think of this the more I feel that the same functionality should be
> offered for all the other catalog types too. Let's assume that we have an
> engine that caches table metadata and initially uses REST catalog. For such
> an engine the proposed solution would solve the problem of checking table
> freshnes and also reloading the table metadata. A simple code for that
> could be enough if we configured our HTTP client properly (just sketched a
> simple example):
> >> > >
> >> > >
> >> > > tableCache_.put(catalog_.loadTable(tableIdentifier));
> >> > >
> >> > >
> >> > > Also let's assume we solve the issue in 1) and we can answer such a
> call even if we get 304 from the server as the table is unchanged. So with
> this solution with the REST catalog we can be sure that the table is only
> loaded from the catalog if changed (or the age expired). But what if we
> configure another catalog, let's say HiveCatalog. The very same code for
> that catalog would trigger a table reload for every execution causing
> unexpected performance issues.
> >> > > I have to double check but I assume that this HTTP approach
> wouldn't be feasible for other catalog types unfortunately.
> >> > >
> >> > >
> >> > > I hope these arguments make sense :)
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > As a partial solution this is what I have in mind:
> >> > > We can add another function into the catalog API for this purpose.
> Let's say something like this:
> >> > > Table loadTable(Table existingTable);
> >> > >
> >> > >
> >> > > What advantages I see with this:
> >> > > - This could solve issue 1) above. In case the table hasn't changed
> we can simply return 'existingTable' without using HTTP Cache.
> >> > > - The clients wouldn't need to explicitly call for isLatest() and
> such functions to check for freshness, and they wouldn't need to trigger
> table reloading for themselve. This API would be expected to cover this
> under the hood.
> >> > > - The current Catalog.loadTable(TableIdentifier) API wouldn't be
> enough for all the catalog types on it's own, but with this one each
> catalog implementations (e.g. HiveCatalog, REST catalog, etc.) then can
> implement their own way of doing freshness checks and table reloads. For
> REST we could follow the HTTP ETag approach, while for other catalogs we
> could follow other approaches.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Gabor
> >> > >
> >> > >
> >> > > On Mon, Nov 18, 2024 at 8:48 AM Shani Elharrar
> <[email protected]> wrote:
> >> > >
> >> > > You're totally right. Perhaps using a "Content-Location" header
> might be a better fit for that.
> >> > >
> >> > >
> >> > > Shani.
> >> > >
> >> > >
> >> > > On 18 Nov 2024, at 9:27, Taeyun Kim <[email protected]>
> wrote:
> >> > >
> >> > >
> >> > > 
> >> > > Hi,
> >> > >
> >> > >
> >> > > Here are my thoughts:
> >> > >
> >> > >
> >> > > - The value of ETag is (as far as I know) defined as an opaque
> string by the specification, meaning the client shouldn’t interpret or
> assign any significance to it, regardless of what the server specifies.
> It’s best to avoid the client giving any particular meaning to the ETag
> value.
> >> > > - One major advantage of the header approach compared to other
> methods is that if an update has occurred, the updated content can be
> immediately included in the response without requiring an additional
> request. This saves one request-response round-trip (although It’s also
> possible to define a separate endpoint with the same functionality).
> >> > > - Since the Iceberg REST catalog server is effectively a type of
> HTTP server, at least in theory, it may be expected to handle HTTP cache
> and validation-related processes. The header approach can be seen as
> leveraging this mechanism appropriately.
> >> > > - The header approach doesn’t have to be limited to the
> /v1/{prefix}/namespaces/{namespace}/tables/{table} endpoint. It could also
> be applied to all GET-based endpoints, though this might broaden the scope
> significantly.
> >> > >
> >> > >
> >> > > Thank you.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > -----Original Message-----
> >> > > From: "Shani Elharrar" <[email protected]>
> >> > > To: <[email protected]>;
> >> > > Cc: <[email protected]>;
> >> > > Sent: 2024-11-18 (월) 16:21:16 (UTC+09:00)
> >> > > Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is
> the latest
> >> > >
> >> > >
> >> > > Using the metadata file name as ETag is nice way to go. In that
> case, adding HEAD method support to the loadTable endpoint will return the
> latest metadata pointer, which can be used to support "isLatest" without
> returning the body. It can be also leveraged in order to return the latest
> metadata location of the table.
> >> > >
> >> > >
> >> > > Shani.
> >> > >
> >> > >
> >> > > On 18 Nov 2024, at 8:52, Yufei Gu <[email protected]> wrote:
> >> > >
> >> > >
> >> > > 
> >> > >
> >> > > Hi Taeyun,
> >> > >
> >> > > Thank you for the clear explanation.
> >> > >
> >> > > I agree that the ETag solution is more suitable. If we were going
> that way, I'd propose a customized version number as an ETag—for instance,
> leveraging the metadata.json file name as the identifier.
> >> > >
> >> > > To summarize, HTTP caching relies on headers (e.g., ETag or
> Last-Modified) to validate whether a version is up-to-date, whereas the
> alternative approach proposed above uses an additional parameter for
> verification. From my perspective, there isn’t a fundamental difference
> between the two, so I’m OK with either.
> >> > >
> >> > > A couple of points to note:
> >> > >
> >> > > Both approaches would require changes to the "loadTable" endpoint.
> >> > > A minor advantage of HTTP caching is that it integrates seamlessly
> with browsers, but since most clients of the Iceberg REST catalog aren’t
> browsers, this may not be a significant factor.
> >> > > I’d also recommend considering the requirement to retrieve multiple
> tables(e.g., all tables under a namespace, or a list of table names) from
> the catalog. This requires a new endpoint and may not work with HTTP
> caching.
> >> > >
> >> > > Let me know your thoughts or if there’s anything else to consider.
> >> > >
> >> > > Yufei
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Sun, Nov 17, 2024 at 6:43 PM Taeyun Kim <
> [email protected]> wrote:
> >> > > Hi,
> >> > >
> >> > > To Gabor:
> >> > > It doesn’t seem necessary to interpret HTTP caching literally in
> this context.
> >> > > Simply using the HTTP headers defined by HTTP caching to check the
> freshness of metadata should be sufficient.
> >> > > There’s no requirement for the client to duplicate or store cached
> HTTP responses.
> >> > >
> >> > > To Yufei:
> >> > > As I understand it, the client doesn’t send its own timestamp but
> instead uses the timestamp originally provided by the server in the
> Last-Modified header.
> >> > > Therefore, clock synchronization issues should not be a concern.
> >> > >
> >> > > Here’s the general flow of HTTP cache validation based on
> If-Modified-Since:
> >> > >
> >> > > - Client: initial request:
> >> > >
> >> > > GET (url) HTTP/1.1
> >> > >
> >> > > - Server response:
> >> > >
> >> > > HTTP/1.1 200 OK
> >> > > Last-Modified: (date1)
> >> > > Cache-Control: no-store, no-cache, max-age=0, must-revalidate,
> proxy-revalidate
> >> > > (with response body)
> >> > >
> >> > > - Client: validation request:
> >> > >
> >> > > GET (url) HTTP/1.1
> >> > > If-Modified-Since: (date1)
> >> > >
> >> > > - Server response (if unchanged):
> >> > >
> >> > > HTTP/1.1 304 Not Modified
> >> > > Last-Modified: (date1)
> >> > > Cache-Control: no-store, no-cache, max-age=0, must-revalidate,
> proxy-revalidate
> >> > > (without response body)
> >> > >
> >> > > - Server response (if updated):
> >> > >
> >> > > HTTP/1.1 200 OK
> >> > > Last-Modified: (date2)
> >> > > Cache-Control: no-store, no-cache, max-age=0, must-revalidate,
> proxy-revalidate
> >> > > (with response body)
> >> > >
> >> > > However, using time-based freshness checks can present challenges,
> such as parsing time formats or synchronizing file update times across
> servers.
> >> > > To address these issues, HTTP cache validation based on ETag is
> also defined in the specification.
> >> > >
> >> > > Here’s the flow for ETag-based validation:
> >> > >
> >> > > - Client: initial request:
> >> > >
> >> > > GET (url) HTTP/1.1
> >> > >
> >> > > - Server response:
> >> > >
> >> > > HTTP/1.1 200 OK
> >> > > ETag: "(arbitrary string 1 generated by the server)"
> >> > > Cache-Control: no-store, no-cache, max-age=0, must-revalidate,
> proxy-revalidate
> >> > > (with response body)
> >> > >
> >> > > - Client: validation request:
> >> > >
> >> > > GET (url) HTTP/1.1
> >> > > If-None-Match: "(arbitrary string 1 generated by the server)"
> >> > >
> >> > > - Server response (if unchanged):
> >> > >
> >> > > HTTP/1.1 304 Not Modified
> >> > > ETag: "(arbitrary string 1 generated by the server)"
> >> > > Cache-Control: no-store, no-cache, max-age=0, must-revalidate,
> proxy-revalidate
> >> > > (without response body)
> >> > >
> >> > > - Server response (if updated):
> >> > >
> >> > > HTTP/1.1 200 OK
> >> > > ETag: "(arbitrary string 2 generated by the server)"
> >> > > Cache-Control: no-store, no-cache, max-age=0, must-revalidate,
> proxy-revalidate
> >> > > (with response body)
> >> > >
> >> > > The server can choose to use either If-Modified-Since or ETag for
> freshness validation.
> >> > > Alternatively, to simplify the implementation related to the
> Iceberg REST catalog, it might make sense to define only the more accurate
> ETag-based validation in the spec.
> >> > > For reference, RFC 9110 recommends specifying both ETag and
> Last-Modified. When both are provided, ETag takes precedence.
> >> > >
> >> > > Note on Cache-Control Headers:
> >> > > The Cache-Control values in the examples above are intended to
> ensure that the client validates freshness with the server on every
> request. Writing the header in this extended format is primarily to
> accommodate outdated HTTP/1.1 implementations. However, under the HTTP/1.1
> specification, the following is sufficient:
> >> > >
> >> > > Cache-Control: no-cache
> >> > >
> >> > > That’s all for now.
> >> > > Thank you.
> >> > >
> >> > >
> >> > > -----Original Message-----
> >> > > From: "Yufei Gu" <[email protected]>
> >> > > To: <[email protected]>;
> >> > > Cc:
> >> > > Sent: 2024-11-16 (토) 02:51:05 (UTC+09:00)
> >> > > Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is
> the latest
> >> > >
> >> > >
> >> > >
> >> > > How does HTTP caching handle desynchronized clocks between clients
> and the server?
> >> > >
> >> > > At t0, the client gets the latest table version.
> >> > > At t1, the server makes a new commit.
> >> > > At t2, the client sends a request with a timestamp t2, but due to
> desynchronization, it refers to t0.
> >> > >
> >> > > The server may reply with 304 Not Modified, causing the client to
> think its cache is up-to-date and miss the commit at t1.
> >> > >
> >> > >
> >> > >
> >> > > Yufei
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Nov 15, 2024 at 6:37 AM Gabor Kaszab <
> [email protected]> wrote:
> >> > > Hi All,
> >> > >
> >> > >
> >> > > First of all it's great to see that there are others who could
> benefit from giving a solution to this problem. I appreciate all the
> comments and feedback so far.
> >> > > There were a number of different opinions, so let me start with
> summarizing the different topics that came up:
> >> > >
> >> > >
> >> > > New endpoint vs using an existing endpoint:
> >> > > Based on the answers (Fokko, Yufei) I had the impression that we
> should be careful when adding new REST endpoints, and we should examine the
> re-use of existing endpoints first. Let's do that then, and in case we
> don't find it feasible then we can still fall back to any of my initial
> proposals (isLatest() or metadataLocation()).
> >> > >
> >> > >
> >> > > Granularity of freshness checks:
> >> > > It was brought up (Dmitri) that we might not want to do the
> metadata freshness checks solely based on metadata location, but we should
> consider doing more granular freshness checks. I personally don't see much
> benefit of designing this solution like that, TBH, but seeing some
> use-cases could help us understand the motivation here.
> >> > > Let me share my opinion on some of the arguments:
> >> > >
> >> > >
> >> > > "A change in metadata location does not necessarily mean a change
> in metadata content"
> >> > >
> >> > >
> >> > > AFAIK whenever Iceberg creates a new metadata file there is some
> change in the metadata itself. There might not be a new snapshot, though in
> the cases of e.g. a schema/partition evolution. But even in these cases
> triggering a table reload could make sense to me (e.g. answering SHOW
> CREATE TABLE and similar queries). Additionally, I'd assume the number of
> metadata location changes that don't create a new snapshot is too
> negligible to optimize for.
> >> > > Dmitri, let me know if I misunderstood something.
> >> > >
> >> > >
> >> > > "it may still be beneficial to permit the client to ask for changes
> to specific areas of metadata"
> >> > >
> >> > > This seems like a use-case that the partial metadata loading
> proposal could solve. To identify the need to load a specific part of the
> metadata with partial metadata loading seems an overkill to design with my
> proposal, if this is what you have in mind. Also I found that the partial
> metadata loading proposal faces serious headwinds, so I wouldn't rely on it
> at the moment.
> >> > >
> >> > >
> >> > > Re-using tableExists
> >> > > I think there is a consensus here that tableExists returning a
> metadata location could work but seems more like a workaround and could be
> misleading for the users.
> >> > >
> >> > > Partial metadata loading could solve this:
> >> > > (Yufei) I agree, it would be perfect for my use-case and I'm
> following the discussion on the proposal. However, for me it seems, as I
> wrote above, that the proposal faces serious headwinds now and I honestly
> wouldn't expect a solution in the short term. But solving the freshness
> problems is a more urgent thing to solve, not just for myself and Impala
> but apparently to many other stakeholders in the community according to the
> interest on this thread.
> >> > > Hence, I propose to come up with a separate solution for freshness
> checks, and we can still move to using partial metadata loading once that's
> out.
> >> > >
> >> > >
> >> > > Use HTTPCache and If-Modified-Since with loadTable
> >> > > This solution seems to do the trick for us. Let me do some research
> myself to see if there are any difficulties implementing this. Currently, I
> have more questions than answers wrt this approach :)
> >> > > - The initial problem is to answer freshness questions for the
> cached tables on the client side. If we introduce HttpCaching wouldn't we
> introduce the same problem but on a different level of representation. We'd
> then need to decide the freshness/staleness of the cached data in the HTTP
> layer.
> >> > > - If we cache the HTTP responses for a loadTable then we
> essentially cache the content of the metadata.jsons including the snapshot
> and metadata log and everything, plus the snapshot list (and I think the
> manifests for the latest snapshot). I believe that the size of this can
> easily reach the low megabytes range in memory, so in total keeping them in
> the HTTP Cache for all the tables we have queried can easily mean that we
> keep a couple of GBs in memory just for this purpose.
> >> > > For engines that already cache table metadata wouldn't this mean
> that we will cache some parts of the metadata redundantly?
> >> > > - How would we decide what is the max-age of a cached table
> metadata in the HTTP Cache? Would it be configurable so that each engine
> could use whatever it prefers?
> >> > >
> >> > >
> >> > > Sorry if any of the questions doesn't make sense, I just want to
> make sure I understand all the aspects of this approach.
> >> > >
> >> > >
> >> > > An additional topic I have in mind:
> >> > > REST catalog vs other catalogs:
> >> > > Now we are focusing our discussion on the REST spec, but I think it
> would be beneficial to extend our focus and cover other catalog
> implementations too. I don't think that this problem of data freshness is
> specific to REST catalog, it could affect any table in any other catalog
> too.
> >> > >
> >> > >
> >> > > I'll continue my investigation wrt the proposals, I just wanted to
> flush out and sum up what we have now before the weekend.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Gabor
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Nov 15, 2024 at 10:16 AM Jean-Baptiste Onofré <
> [email protected]> wrote:
> >> > > Hi,
> >> > >
> >> > > I like the idea and it makes sense. As soon as it's clearly stated
> in
> >> > > the spec (using If-Modified-Since header and 304 status code), it
> >> > > looks good to me.
> >> > >
> >> > > Thanks !
> >> > > Regards
> >> > > JB
> >> > >
> >> > > On Fri, Nov 15, 2024 at 1:58 AM Taeyun Kim <
> [email protected]> wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > (Apologies if this email is a duplicate. This is my third
> attempt.)
> >> > > >
> >> > > > I also need a way to ensure that my table data is up-to-date. For
> now, I’m handling this by setting an expiration period after which I fetch
> the data again, regardless of its freshness.
> >> > > >
> >> > > > Here are my thoughts on the current suggestions. Please correct
> me if I've misunderstood any of the points.
> >> > > >
> >> > > > - isLatest(): This function could be inefficient since it would
> require an additional round-trip to fetch the metadata if it’s not
> up-to-date. This would result in two round-trips overall, which seems
> suboptimal.
> >> > > > - metadataLocation(): This has a similar issue as isLatest().
> BTW, according to the REST catalog API documentation for LoadTableResult
> schema, it states, "Clients can check whether metadata has changed by
> comparing metadata locations after the table has been created." (
> https://github.com/apache/iceberg/blob/3659ded18d50206576985339bd55cd82f5e200cc/open-api/rest-catalog-open-api.yaml#L3175)
> This suggests that if the metadata location has changed, the metadata can
> be considered updated.
> >> > > > - tableExists(): Based on the name, this function seems to serve
> a different purpose.
> >> > > >
> >> > > > Here is my suggestion:
> >> > > >
> >> > > > Since HTTP has built-in caching features (
> https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching), and REST
> catalogs operate over HTTP, it seems natural to leverage HTTP caching
> mechanisms. For example, HTTP includes the If-Modified-Since header and the
> 304 Not Modified status code. Using this approach, we could achieve data
> freshness with a single round-trip, fetching updated data only if there are
> modifications.
> >> > > >
> >> > > > What do you think about defining the spec in this direction?
> >> > > >
> >> > > > Thank you.
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > -----Original Message-----
> >> > > > From: "Yufei Gu" <[email protected]>
> >> > > > To: <[email protected]>;
> >> > > > Cc:
> >> > > > Sent: 2024-11-13 (수) 03:43:24 (UTC+09:00)
> >> > > > Subject: Re: [DISCUSS] REST: Way to query if metadata pointer is
> the latest
> >> > > >
> >> > > >
> >> > > >
> >> > > > Hi Gamber,
> >> > > >
> >> > > > Thanks for the proposal! Impala isn’t unique in needing this—I've
> seen similar requirements from other engines.
> >> > > >
> >> > > > As others pointed out, using the “tableExists” endpoint seems
> like a workaround. I don't consider it a permanent way forward. We could
> address this by either modifying the current load table endpoint or
> introducing a new one, but ideally, we should avoid adding endpoints for
> every specific need. With that, partial metadata loading seems like a
> strong approach here, we will need certain agreement though. I'd suggest
> the community consider the use cases seriously. We need a way forward.
> >> > > >
> >> > > > I’m also not too concerned about using metadata file paths to
> verify the latest table version; clients can simply extract metadata
> filenames, which include the UUID.
> >> > > >
> >> > > > Yufei
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, Nov 12, 2024 at 7:46 AM Jean-Baptiste Onofré <
> [email protected]> wrote:
> >> > > >
> >> > > > Hi Fokko
> >> > > >
> >> > > > I like the idea, but I think it's more a workaround and could be
> >> > > > confusing for users :)
> >> > > >
> >> > > > Regards
> >> > > > JB
> >> > > >
> >> > > > On Tue, Nov 12, 2024 at 2:53 PM Fokko Driesprong <
> [email protected]> wrote:
> >> > > > >
> >> > > > > Hey Gabor,
> >> > > > >
> >> > > > > Thanks for raising this. While reading this, my first thought
> is to leverage the `tableExists` operation:
> >> > > > >
> https://github.com/apache/iceberg/blob/e3f39972863f891481ad9f5a559ffef093976bd7/open-api/rest-catalog-open-api.yaml#L1129-L1160
> >> > > > >
> >> > > > > This doesn't return anything today, but we could return a
> payload to the latest metadata.json.
> >> > > > >
> >> > > > > Looking forward to what others think.
> >> > > > >
> >> > > > > Kind regards,
> >> > > > > Fokko
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > Op di 12 nov 2024 om 14:33 schreef Shani Elharrar
> <[email protected]>:
> >> > > > >>
> >> > > > >> I recommend option (b), provided there is no partial metadata
> loading. We implemented option (b) internally to facilitate partial
> metadata loading, as we have tables with hundreds of thousands of
> snapshots. This results in metadata that occupies approximately 500 MB in
> memory (excluding the JsonNodes), which is a significant load for some of
> our services.
> >> > > > >>
> >> > > > >> Shani.
> >> > > > >>
> >> > > > >> On 12 Nov 2024, at 14:12, Gabor Kaszab <[email protected]>
> wrote:
> >> > > > >>
> >> > > > >> Hey Iceberg Community,
> >> > > > >>
> >> > > > >> Background:
> >> > > > >> Impala is designed in a way to cache the Iceberg table
> metadata (BaseTable objects in practice) for faster access. Currently,
> Impala is tightly coupled with HMS and in turn with the HiveCatalog, and in
> order to keep the cached table objects up-to-date there is a notification
> mechanism driven by HMS to notify Impala about any changes in the table
> metadata.
> >> > > > >> The Impala community is actively looking for ways to decouple
> HMS from Impala and provide a way to use Impala without the need for HMS,
> and get the Iceberg table metadata from other catalog Implementations
> mainly focusing now on REST catalogs.
> >> > > > >>
> >> > > > >> Problem to solve:
> >> > > > >> We identified a particular missing functionality in the
> current REST spec: For engines that cache table metadata currently there is
> no way to check if that table metadata is up-to-date or not, and whether
> the engine should reload the metadata for that table or not without getting
> a whole table object from the catalog. For this I think the REST catalog
> (but in fact I think this could apply to any other catalogs) should be able
> to answer a question like:
> >> > > > >> "Hi Catalog, I have this version of this table, is it
> up-to-date?"
> >> > > > >>
> >> > > > >> Proposal:
> >> > > > >> I've been following the discussion about partial metadata
> loading that could be also used to answer the above question, but I have
> the impression now that the conversation stopped making any progress.
> >> > > > >> So instead of waiting for partial metadata loading I propose
> to have an addition to the REST spec now to answer the question I raised
> above:
> >> > > > >>
> >> > > > >> a) boolean isLatest(TableIdentifier ident, String
> metadataLocation);
> >> > > > >> b) String metadataLocation(TableIdentifier ident);
> >> > > > >>
> >> > > > >> Any of the above 2 approaches could help engines to decide if
> they have to invalidate/reload particular table metadata in the cache. I
> personally would go for option a) but would be open to hear other opinions.
> >> > > > >>
> >> > > > >> I'd like to know if the community could support me extending
> the REST spec with any of the 2 options.
> >> > > > >>
> >> > > > >> Regards,
> >> > > > >> Gabor
> >> > > > >>
> >> > > > >>
> >> > > ![](
> https://innowireless.dooray.com/mail-receipts?img=7547413759332b66-3113a64e77e5c3fb-36ab7c8bd44b0e02-36ab7e7273f7887e.gif
> )
>

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

Reply via email to