[GitHub] [iceberg] pvary commented on a change in pull request #1465: Core: Enhance version-hint.txt recovery with file listing

GitBox Thu, 17 Sep 2020 03:10:40 -0700


pvary commented on a change in pull request #1465:
URL: https://github.com/apache/iceberg/pull/1465#discussion_r490128410




##########
File path: 
core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java
##########
@@ -289,19 +311,24 @@ int readVersionHint() {
       return Integer.parseInt(in.readLine().replace("\n", ""));
 
     } catch (Exception e) {
-      LOG.warn("Error reading version hint file {}", versionHintFile, e);
+      LOG.debug("Error reading version hint file {}", versionHintFile, e);

Review comment:
       >  I think this approach is actually better cause it's simple however it 
doesn't control the latency of the list API...
   
   If I understand correctly in HadoopTableOperations.refresh() we use the 
number provided by versionHint() only as a basis for checking newer versions. 
Would that cover your concerns?
   ```
         Path nextMetadataFile = getMetadataFile(ver + 1);
         while (nextMetadataFile != null) {
           ver += 1;
           metadataFile = nextMetadataFile;
           nextMetadataFile = getMetadataFile(ver + 1);
         }
   
         updateVersionAndMetadata(ver, metadataFile.toString());
   ```
   
   > @pvary Big difference in the listing directory vs file approach is the 
constant time in loading data from a file (constant time) vs using hadoop 
listing directory (probably not constant time).
   
   With the current patch we still do the fast / file based version resolving, 
and only fall back to the listing based version if there is a problem with the 
version-hint.txt file.
   
   > I would assume it would take longer to resolve the version and probably it 
would be advised that older versions are dropped so the list API provides a 
decent latency.
   
   Based on my current understanding the older versions might be needed for 
timetravel or other features. It should be really the decision of the users to 
decide when they want to remove them.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on a change in pull request #1465: Core: Enhance version-hint.txt recovery with file listing

Reply via email to