Re: Hive Metastore integration future

Ryan Blue Fri, 14 Feb 2020 16:22:48 -0800

Sorry for the late reply, everyone. This slipped to the bottom of my inbox.


As for the Hive version, we were using 1.2.1 for the Spark runtime because
that was the default in Spark. I think that changes in Spark 3.0, so when
we move master over to build for 3.0, we should update the default Hive
version as well. I think that should solve a lot of these problems
because Spark will no longer be reliant on such an old version and we can
continue to use the version that Spark provides. Does that sound like
a good plan?

For the metastore project, I'm not sure whether I would include it in
Iceberg or not. I wouldn't want Iceberg to suffer from feature creep, but I
think that good integration between a new metastore and Iceberg would be
really beneficial. I'd be happy either way, as long as we don't make it
impossible for users that still rely on the Hive metastore or other
implementations.

rb

On Wed, Jan 29, 2020 at 10:01 AM Kristopher Kane <kk...@etsy.com.invalid>
wrote:

> " It would be simply to gain full functionality of Hive" . That should
> read Iceberg.
>
> On Wed, Jan 29, 2020 at 12:55 PM Kristopher Kane <kk...@etsy.com> wrote:
>
>> Adrian, "I'd imagine that keeping binary compatibility across Hive, Spark
>> and Iceberg will be quite a challenge."  Yeah, this is what I'm afraid of
>> over time.  Iceberg's big draw for me is only maintaining a processing
>> engine (Spark), Iceberg and cloud storage compatibility and any potential
>> Iceberg use wouldn't even be with the rest of the Hive ecosystem. It would
>> be simply to gain full functionality of Hive via a ready-to-use metastore
>> which, right now, defaults to Hive.  Hive 3, with Ranger and Atlas and
>> Ranger based security, take things even further away for Spark as it is not
>> allowing interaction with Hive intrinsic services like the metastore
>> anyway.  It might be that you can run the Hive 3 metastore for now but the
>> paths forward don't suggest that is accessible for much into the future.
>>
>> Ryan, when you said, "I'd really love to see a new metastore project,"
>> did you mean internal to the Iceberg project?
>>
>> Kris
>>
>> On Wed, Jan 29, 2020 at 12:17 PM Mass Dosage <massdos...@gmail.com>
>> wrote:
>>
>>> On the topic of Hive versions - we've definitely experienced some issues
>>> trying to programmatically use the iceberg-spark-runtime artifact in unit
>>> tests (it uses Hive 1.2 as mentioned above). We then tried to also use some
>>> other common HIve testing libraries like HiveRunner
>>> <https://github.com/klarna/HiveRunner/> and BeeJU
>>> <https://github.com/HotelsDotCom/beeju> which in turn use Hive 2.3. We
>>> then ended up with exceptions (e.g. "Method not found") due to
>>> incompatibilities between the Hive library classes and had to abandon the
>>> testing libraries. I can share these exceptions if that would be useful but
>>> I'd imagine that keeping binary compatibility across Hive, Spark and
>>> Iceberg will be quite a challenge. I'd prefer Iceberg defaulting to Hive
>>> 2.3.x over 1.2 as 1.2 is pretty old, I don't think any of the commercial
>>> Hadoop vendors officially support it any more and I think it's used a lot
>>> less now than 2.x but I could be wrong. Alternatively a way to pick and
>>> choose a Hive version would be great but probably quite a bit of work to
>>> pull off...
>>>
>>> Adrian
>>>
>>> On Wed, 29 Jan 2020 at 16:59, Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>>> Hi Kris,
>>>>
>>>> We use version 1.2.1 because the part that we're using hasn't changed
>>>> much and we want to ensure compatibility with old metastore versions.
>>>> Iceberg should work with newer metastores, and feel free to open a bug if
>>>> you find a problem with one. We'll make sure to fix it to be compatible
>>>> with a range of versions.
>>>>
>>>> I'm not sure what people are going to want eventually. Right now, we
>>>> know that many people use the Hive metastore to track tables, so it makes
>>>> sense to support it as an option. Iceberg allows you to plug in your own
>>>> metastore easily because we know that lots of places (Netflix included)
>>>> have their own metastore implementations. I'd really love to see a new
>>>> metastore project, but I don't think that Iceberg should be opinionated
>>>> about which one you use.
>>>>
>>>> rb
>>>>
>>>> On Wed, Jan 29, 2020 at 7:32 AM Kristopher Kane <kk...@etsy.com.invalid>
>>>> wrote:
>>>>
>>>>> Hi Iceberg.
>>>>>
>>>>> It looks like for most cases where non-atomic rename is required,
>>>>> using the Hive metastore is the baseline with the ability to implement a
>>>>> custom.
>>>>>
>>>>> I couldn't find mailing list history or GitHub issue that suggests
>>>>> that Iceberg will implement its own. Is that intended for the future?
>>>>>
>>>>> I ask because Iceberg's metastore version pin is 1.2.1 which is very
>>>>> old.  Someone using Iceberg, with a Hive metastore, mind find difficult
>>>>> moving maintaining peace in upgrades with Hive.
>>>>>
>>>>> Related:  Is the intention here that existing Hive users would use the
>>>>> store that they have and new Iceberg users would implement custom?
>>>>>
>>>>> Appreciate help in understanding,
>>>>>
>>>>> Kris
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Hive Metastore integration future

Reply via email to