pvary commented on a change in pull request #1465:
URL: https://github.com/apache/iceberg/pull/1465#discussion_r490128410
##########
File path:
core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java
##########
@@ -289,19 +311,24 @@ int readVersionHint() {
return Integer.parseInt(in.readLine().replace("\n", ""));
} catch (Exception e) {
- LOG.warn("Error reading version hint file {}", versionHintFile, e);
+ LOG.debug("Error reading version hint file {}", versionHintFile, e);
Review comment:
> I think this approach is actually better cause it's simple however it
doesn't control the latency of the list API...
If I understand correctly in HadoopTableOperations.refresh() we use the
number provided by versionHint() only as a basis for checking newer versions.
Would that cover your concerns?
```
Path nextMetadataFile = getMetadataFile(ver + 1);
while (nextMetadataFile != null) {
ver += 1;
metadataFile = nextMetadataFile;
nextMetadataFile = getMetadataFile(ver + 1);
}
updateVersionAndMetadata(ver, metadataFile.toString());
```
> @pvary Big difference in the listing directory vs file approach is the
constant time in loading data from a file (constant time) vs using hadoop
listing directory (probably not constant time).
With the current patch we still do the fast / file based version resolving,
and only fall back to the listing based version if there is a problem with the
version-hint.txt file.
> I would assume it would take longer to resolve the version and probably it
would be advised that older versions are dropped so the list API provides a
decent latency.
Based on my current understanding the older versions might be needed for
timetravel or other features. It should be really the decision of the users to
decide when they want to remove them.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]