Re: Iceberg and Hive

Ryan Blue Mon, 07 Jan 2019 14:29:51 -0800

Vladi,

I'll add a little to Owen's answer for context. Owen was right that using
an Iceberg table in Hive will require some work implementing the RawStore
API. But the `iceberg-hive` module will currently use the Hive Metastore to
keep track of Iceberg metadata.


An Iceberg table isn't a Hive table. Iceberg requires extra metadata and
doesn't meet assumptions that Hive makes about data because Iceberg tracks
what files are in a table differently. Still, most people want to use a
Hive Metastore instance to track Iceberg tables because they already have
one. That's what iceberg-hive provides. It stores Iceberg's root metadata
location and ensures changes to that location through the Iceberg library
are atomic.

While you can use iceberg-hive to keep track of tables, engines still need
to use iceberg-hive to access Iceberg tables, too. Right now, the only one
that does this is Presto, in the open PR. I also need to update the Spark
support to use iceberg-hive by default and not just HDFS-based tables. This
is an issue we intend to get done for the 1.0 release.

I hope that helps!

rb

On Mon, Jan 7, 2019 at 1:09 PM Owen O'Malley <[email protected]> wrote:

> The group has moved to the Apache infrastructure, so we should use
> [email protected] .
>
> What is required, but not started, is for someone to implement Hive's
> RawStore API with an Iceberg backend. That would let you use Hive SQL
> commands to manipulate the Iceberg tables.
>
> .. Owen
>
>
> On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
> [email protected]> wrote:
>
>> Hello ,
>>
>> I still confused a bit how Iceberg interacts with Hive (metastore).
>> In our case we have many Hive tables and a lot Spark and Presto jobs
>> reading, creating, writing to Hive
>> Moving to Iceberg, even gradually raising a few questions :
>> 1. Are new tables created via Iceberg visible (by sparlk/presto) in Hive
>> metastore as well?
>> 2. Should we migrate somehow existing Hive tables to be supported by
>> Iceberg?
>> 3. Is there any impact on the existing (spark,presto) jobs when moving to
>> Iceberg?
>>
>> I understand that creating a new system from scratch with Iceberg is
>> probably easier comparing to the projects heavily using Hive metastore but
>> this is the use case in a lot of projects nowdays
>> Thank you
>> Vladi Feigin
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Iceberg Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/iceberg-devel/5d38541c-f73f-471f-b8db-5430238c4376%40googlegroups.com
>> <https://groups.google.com/d/msgid/iceberg-devel/5d38541c-f73f-471f-b8db-5430238c4376%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/CAHfHakFL43a5c8zOXF5voYK4DU2Byq7XMeoJL%3DWqvab7KGYL-A%40mail.gmail.com
> <https://groups.google.com/d/msgid/iceberg-devel/CAHfHakFL43a5c8zOXF5voYK4DU2Byq7XMeoJL%3DWqvab7KGYL-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg and Hive

Reply via email to