gaborkaszab commented on PR #16319:
URL: https://github.com/apache/iceberg/pull/16319#issuecomment-4544021822

   Thanks for the explanation, @yadavay-amzn ! I understood the same with my 
first pass on the code for the 2 use-case. 
   
   I'm not sure I could argue why to keep ETags on 2 different places, and what 
is the use-case to maintain a table cache-level freshness-aware loading, and an 
ops-level freshness-aware loading. If we take a step back and don't look at how 
this is implemented currently, a user might expect to not do a full table load 
after downloading the full metadata from the REST server, unless the table 
changed in the meantime. With this design we might load the changed table 
twice, once for ops and once for the table cache. E.g.
   1) first we load the table to populate the cache. 
   2) The table changes after this.
   3) We do ops `refresh()` that does a full table load. 
   4) Then we do a catalog.load() that does the same full table load, getting 
the same ETag as in 3.
   Users might rightfully say we shouldn't do a full load in step 4) because we 
already loaded the latest table to the client in step 3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to