To Xabriel's point, it would be good to have a Store abstraction so that one could plug-in an implementation be it HMS or something else.
On Mon, Mar 18, 2019 at 3:20 PM Xabriel Collazo Mojica <xcoll...@adobe.com.invalid> wrote: > +1 for having a tool/API to migrate tables from HMS into Iceberg. > > > > We do not use HMS in my current project, but since HMS is the de facto > catalog in most companies doing Hadoop, I think such a tool would be vital > for incentivizing Iceberg adoption and/or PoCs. > > > > *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe | > xcoll...@adobe.com > > > > *From: *<aokolnyc...@apple.com> on behalf of Anton Okolnychyi > <aokolnyc...@apple.com.INVALID> > *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> > *Date: *Monday, March 18, 2019 at 2:22 PM > *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, Ryan Blue < > rb...@netflix.com> > *Subject: *Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive > Metastore > > > > I definitely support this idea. Having a clean and reliable API to migrate > existing Spark tables to Iceberg will be helpful. > > I propose to collect all requirements for the new API in this thread. Then > I can come up with a doc that we will discuss within the community. > > > > From the feature perspective, I think it would be important to support > tables that persist partition information in HMS as well as tables that > derive partition information from the folder structure. Also, migrating > just a partition of a table would be useful. > > > > > > On 18 Mar 2019, at 18:28, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > > > I think that would be fine, but I want to throw out a quick warning: > SparkTableUtil was initially written as a few handy helpers, so it wasn't > well designed as an API. It's really useful, so I can understand wanting to > extend it. But should we come up with a real API for these conversion tasks > instead of updating the hacks? > > > > On Mon, Mar 18, 2019 at 11:11 AM Anton Okolnychyi < > aokolnyc...@apple.com.invalid> wrote: > > Hi, > > SparkTableUtil can be helpful for migrating existing Spark tables into > Iceberg. Right now, SparkTableUtil assumes that the partition information > is always tracked in Hive metastore. > > What about extending SparkTableUtil to handle Spark tables that don’t rely > on Hive metastore? I have a local prototype that makes use of Spark > InMemoryFileIndex to infer the partitioning info. > > Thanks, > Anton > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > >