On Fri, Sep 13, 2019 at 1:47 PM Ryan Blue <[email protected]> wrote:
> Hi Dave, > > I'm sure we can get this working, but I'd like to understand what you're > trying to do a bit better. > > Why do you need atomic rename? Iceberg is set up to write data in place > and not move or rename files. Committing those files to a table is an > atomic operation instead. Everything should work with GCS without > modification as far as I know, unless you don't want to use the Hadoop > FileSystem APIs. > > There is no native atomic rename in GCS, it requires a move + delete. From the page https://iceberg.apache.org/spec/#mvcc-and-optimistic-concurrency : "Tables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files." This ^ is what we are addressing ^. ie. when a snapshot commit occurs and the tmp metadata file is renamed to the snapshot metadata file. >From HadoopTableOperations.java L#248: /** * Renames the source file to destination, using the provided file system. If the rename failed, * an attempt will be made to delete the source file. * * @param fs the filesystem used for the rename * @param src the source file * @param dst the destination file */ private void renameToFinal(FileSystem fs, Path src, Path dst) { try { if (!*fs.rename*(src, dst)) { The above is called in commit, and AFAIK comes with the assumption that the FileSystem.rename() is atomic... ? > Keeping file appenders open using a write property or a table property > sounds like a good idea to me. I wouldn't want this to be the default for > batch writes, but I think it may make sense as an option for streaming > writes. I'd prefer to add these features to the existing streaming writer > instead of allowing users to use their own custom writer. Are there other > reasons to replace the writer instead of making this behavior configurable? > Nope, that was the only reason. That is fine then if this could be supported for streaming writes. >
