> Oh but s3Guard will not solve the atomicity problem, right?
S3Guard does solve the atomicity problem, because compactors don't just rename directories. The basic consistency needed for ACID is - list after delete and list after create (which S3 does not have). They also place a file named '_orc_acid_version' in the directory. This happens after rename() returns. fs.rename(fileStatus.getPath(), newPath); AcidUtils.OrcAcidVersion.writeVersionFile(newPath, fs); With S3Guard, all that is needed is to check for that file (& if it is missing it is not a complete compacted dir yet). However, the "open a txn for compact & commit it" is definitely neater. > So that means that the directory will be "visible while in progress", and > the reader might pick up the compacted directory even when all files > haven't been copied. In another thread today, I mentioned how ACID is built on top of ignoring directories, it can do that easily. The Parquet or Avro transactional system in Hive boils down to a PathFilter with some numbers in the path. Cheers, Gopal