Hi,
We were trying out Iceberg with Spark and happen to look into the code
responsible for writing version-hint file.
In the following code snippet:

private void writeVersionHint(int version) {
  Path versionHintFile = versionHintFile();
  FileSystem fs = getFS(versionHintFile, conf);

  try (FSDataOutputStream out = fs.create(versionHintFile, true /*
overwrite */ )) {
    out.write(String.valueOf(version).getBytes("UTF-8"));

  } catch (IOException e) {
    LOG.warn("Failed to update version hint", e);
  }
}


We observe that version-hint file is overwritten always with the same file
name on S3. This ensures that when the `version-hint.text` is created for
first time, HEAD calls to S3 object is avoided as `overwritten=true` in FS
call. This ensures we do not hit Eventual consistency issue while reading
the newly created file. But we were concerned that when file gets
overwritten multiple times, read can see older versions of the file due to
EC issue. This is because S3 is eventual consistent with overwritten PUTs
and DELETEs ("Amazon S3 offers eventual consistency for overwrite PUTS and
DELETES in all regions." [1]).

Let us know if this is a known issue or we are missing something here. If
it's a known issue what might be the repercussions.

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html

Reply via email to