Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

Sandeep Nayak Mon, 18 Mar 2019 20:50:46 -0700

To Xabriel's point, it would be good to have a Store abstraction so that
one could plug-in an implementation be it HMS or something else.



On Mon, Mar 18, 2019 at 3:20 PM Xabriel Collazo Mojica
<xcoll...@adobe.com.invalid> wrote:

> +1 for having a tool/API to migrate tables from HMS into Iceberg.
>
>
>
> We do not use HMS in my current project, but since HMS is the de facto
> catalog in most companies doing Hadoop, I think such a tool would be vital
> for incentivizing Iceberg adoption and/or PoCs.
>
>
>
> *Xabriel J Collazo Mojica*  |  Senior Software Engineer  |  Adobe  |
> xcoll...@adobe.com
>
>
>
> *From: *<aokolnyc...@apple.com> on behalf of Anton Okolnychyi
> <aokolnyc...@apple.com.INVALID>
> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
> *Date: *Monday, March 18, 2019 at 2:22 PM
> *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, Ryan Blue <
> rb...@netflix.com>
> *Subject: *Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive
> Metastore
>
>
>
> I definitely support this idea. Having a clean and reliable API to migrate
> existing Spark tables to Iceberg will be helpful.
>
> I propose to collect all requirements for the new API in this thread. Then
> I can come up with a doc that we will discuss within the community.
>
>
>
> From the feature perspective, I think it would be important to support
> tables that persist partition information in HMS as well as tables that
> derive partition information from the folder structure. Also, migrating
> just a partition of a table would be useful.
>
>
>
>
>
> On 18 Mar 2019, at 18:28, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>
>
>
> I think that would be fine, but I want to throw out a quick warning:
> SparkTableUtil was initially written as a few handy helpers, so it wasn't
> well designed as an API. It's really useful, so I can understand wanting to
> extend it. But should we come up with a real API for these conversion tasks
> instead of updating the hacks?
>
>
>
> On Mon, Mar 18, 2019 at 11:11 AM Anton Okolnychyi <
> aokolnyc...@apple.com.invalid> wrote:
>
> Hi,
>
> SparkTableUtil can be helpful for migrating existing Spark tables into
> Iceberg. Right now, SparkTableUtil assumes that the partition information
> is always tracked in Hive metastore.
>
> What about extending SparkTableUtil to handle Spark tables that don’t rely
> on Hive metastore? I have a local prototype that makes use of Spark
> InMemoryFileIndex to infer the partitioning info.
>
> Thanks,
> Anton
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>
>

Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

Reply via email to