fbocse commented on a change in pull request #1465:
URL: https://github.com/apache/iceberg/pull/1465#discussion_r489704855



##########
File path: 
core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java
##########
@@ -289,19 +311,24 @@ int readVersionHint() {
       return Integer.parseInt(in.readLine().replace("\n", ""));
 
     } catch (Exception e) {
-      LOG.warn("Error reading version hint file {}", versionHintFile, e);
+      LOG.debug("Error reading version hint file {}", versionHintFile, e);

Review comment:
       @rdblue we've seen this happening with hadoop implementations that 
implement `create(overwrite=true)` by a server-side `delete` and `create` 
operation - so for a split micro-second we get a 404 on the version-hint.text 
file - so the API isn't consistent - however they do provide read-after-write 
guarantees on directory listing - so we've looked at solving this in a pretty 
similar fashion, but instead of a fallback mechanism we went with the approach 
that a table configuration (or version?) indicates the version resolver 
implementation, either file based or directory listing based.
   
   @jacques-n I don't think that this approach to fallback to directory listing 
in case of failing to resolve the version hint file can lead to silent 
"corruption" of the table. Can you think of such a scenario? I'm asking cause 
we were considering doing something similar so I'm interested in any thoughts 
that may question this approach.
   But to your point if there so much is a possibility that this approach can 
lead to table corruption then we should probably not provide this feature at 
all, let alone say support it by a "turn-this-feature-at-own-risk" approach - 
it will still corrupt the table :)
   
   @pvary Big difference in the listing directory vs file approach is the 
constant time in loading data from a file (constant time) vs using hadoop 
listing directory (probably not constant time).
   I would assume it would take longer to resolve the version and probably it 
would be advised that older versions are dropped so the list API provides a 
decent latency.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to