>    Oh but s3Guard will not solve the atomicity problem, right?

S3Guard does solve the atomicity problem, because compactors don't just rename 
directories.

The basic consistency needed for ACID is - list after delete and list after 
create (which S3 does not have).

They also place a file named '_orc_acid_version' in the directory.

This happens after rename() returns.

        fs.rename(fileStatus.getPath(), newPath);
        AcidUtils.OrcAcidVersion.writeVersionFile(newPath, fs);

With S3Guard, all that is needed is to check for that file (& if it is missing 
it is not a complete compacted dir yet).

However, the "open a txn for compact & commit it" is definitely neater.

> So that means that the directory will be "visible while in progress", and
>  the reader might pick up the compacted directory even when all files
> haven't been copied.

In another thread today, I mentioned how ACID is built on top of ignoring 
directories, it can do that easily.

The Parquet or Avro transactional system in Hive boils down to a PathFilter 
with some numbers in the path.

Cheers,
Gopal    


Reply via email to