Re: Extending spark datasource

Dave Sugden Fri, 13 Sep 2019 12:03:59 -0700

This is great. Thnx!

-d



On Fri, Sep 13, 2019 at 2:37 PM Ryan Blue <[email protected]> wrote:

> Okay, thanks for explaining. I understand now.
>
> The Hadoop table implementation is the only place where rename is used,
> and it requires a file system that supports atomic rename. If you're using
> an object store like S3 or GCS, then you should be using the HMS
> implementation or a custom catalog instead of Hadoop tables.
>
> The difference between these is how Iceberg keeps track of the current
> root metadata file. HMS tables store the metadata location as a table
> property of a table in the Hive MetaStore, and use the table locking API to
> coordinate updates. If you're using the Hive MetaStore, then this should
> work out of the box.
>
> If you are using an alternative metastore, then you just need to implement
> a custom catalog that handles the atomic swap from one metadata location to
> another. Mouli just added a guide for doing this here (thanks!):
> http://iceberg.apache.org/custom-catalog/
>
> That's where you'd plug in your preferred method for making an atomic
> update. That could be locking with ZooKeeper, using a database transaction,
> or some other method. You just need to provide a way to atomically swap
> metadata file location strings, and a way to get the current location.
>
> I hope that helps! In the end it should be easier, since the API for
> plugging in already exists.
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Extending spark datasource

Reply via email to