Sandeep, we use the `TableOperations` API to plug in other metastores, as well as other customizations like `FileIO` implementations. Thanks to Matt and Yifei for their work extending this area!
On Mon, Mar 18, 2019 at 8:49 PM Sandeep Nayak <osgig...@gmail.com> wrote: > To Xabriel's point, it would be good to have a Store abstraction so that > one could plug-in an implementation be it HMS or something else. > > > On Mon, Mar 18, 2019 at 3:20 PM Xabriel Collazo Mojica > <xcoll...@adobe.com.invalid> wrote: > >> +1 for having a tool/API to migrate tables from HMS into Iceberg. >> >> >> >> We do not use HMS in my current project, but since HMS is the de facto >> catalog in most companies doing Hadoop, I think such a tool would be vital >> for incentivizing Iceberg adoption and/or PoCs. >> >> >> >> *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | >> xcoll...@adobe.com >> >> >> >> *From: *<aokolnyc...@apple.com> on behalf of Anton Okolnychyi >> <aokolnyc...@apple.com.INVALID> >> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >> *Date: *Monday, March 18, 2019 at 2:22 PM >> *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, Ryan Blue < >> rb...@netflix.com> >> *Subject: *Re: Extend SparkTableUtil to Handle Tables Not Tracked in >> Hive Metastore >> >> >> >> I definitely support this idea. Having a clean and reliable API to >> migrate existing Spark tables to Iceberg will be helpful. >> >> I propose to collect all requirements for the new API in this thread. >> Then I can come up with a doc that we will discuss within the community. >> >> >> >> From the feature perspective, I think it would be important to support >> tables that persist partition information in HMS as well as tables that >> derive partition information from the folder structure. Also, migrating >> just a partition of a table would be useful. >> >> >> >> >> >> On 18 Mar 2019, at 18:28, Ryan Blue <rb...@netflix.com.INVALID> wrote: >> >> >> >> I think that would be fine, but I want to throw out a quick warning: >> SparkTableUtil was initially written as a few handy helpers, so it wasn't >> well designed as an API. It's really useful, so I can understand wanting to >> extend it. But should we come up with a real API for these conversion tasks >> instead of updating the hacks? >> >> >> >> On Mon, Mar 18, 2019 at 11:11 AM Anton Okolnychyi < >> aokolnyc...@apple.com.invalid> wrote: >> >> Hi, >> >> SparkTableUtil can be helpful for migrating existing Spark tables into >> Iceberg. Right now, SparkTableUtil assumes that the partition information >> is always tracked in Hive metastore. >> >> What about extending SparkTableUtil to handle Spark tables that don’t >> rely on Hive metastore? I have a local prototype that makes use of Spark >> InMemoryFileIndex to infer the partitioning info. >> >> Thanks, >> Anton >> >> >> >> >> -- >> >> Ryan Blue >> >> Software Engineer >> >> Netflix >> >> >> > -- Ryan Blue Software Engineer Netflix