Re: Hive Metastore integration future

Mass Dosage Wed, 29 Jan 2020 09:18:24 -0800

On the topic of Hive versions - we've definitely experienced some issues
trying to programmatically use the iceberg-spark-runtime artifact in unit
tests (it uses Hive 1.2 as mentioned above). We then tried to also use some
other common HIve testing libraries like HiveRunner
<https://github.com/klarna/HiveRunner/> and BeeJU
<https://github.com/HotelsDotCom/beeju> which in turn use Hive 2.3. We then
ended up with exceptions (e.g. "Method not found") due to incompatibilities
between the Hive library classes and had to abandon the testing libraries.
I can share these exceptions if that would be useful but I'd imagine that
keeping binary compatibility across Hive, Spark and Iceberg will be quite a
challenge. I'd prefer Iceberg defaulting to Hive 2.3.x over 1.2 as 1.2 is
pretty old, I don't think any of the commercial Hadoop vendors officially
support it any more and I think it's used a lot less now than 2.x but I
could be wrong. Alternatively a way to pick and choose a Hive version would
be great but probably quite a bit of work to pull off...


Adrian

On Wed, 29 Jan 2020 at 16:59, Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi Kris,
>
> We use version 1.2.1 because the part that we're using hasn't changed much
> and we want to ensure compatibility with old metastore versions. Iceberg
> should work with newer metastores, and feel free to open a bug if you find
> a problem with one. We'll make sure to fix it to be compatible with a range
> of versions.
>
> I'm not sure what people are going to want eventually. Right now, we know
> that many people use the Hive metastore to track tables, so it makes sense
> to support it as an option. Iceberg allows you to plug in your own
> metastore easily because we know that lots of places (Netflix included)
> have their own metastore implementations. I'd really love to see a new
> metastore project, but I don't think that Iceberg should be opinionated
> about which one you use.
>
> rb
>
> On Wed, Jan 29, 2020 at 7:32 AM Kristopher Kane <kk...@etsy.com.invalid>
> wrote:
>
>> Hi Iceberg.
>>
>> It looks like for most cases where non-atomic rename is required, using
>> the Hive metastore is the baseline with the ability to implement a custom.
>>
>> I couldn't find mailing list history or GitHub issue that suggests that
>> Iceberg will implement its own. Is that intended for the future?
>>
>> I ask because Iceberg's metastore version pin is 1.2.1 which is very
>> old.  Someone using Iceberg, with a Hive metastore, mind find difficult
>> moving maintaining peace in upgrades with Hive.
>>
>> Related:  Is the intention here that existing Hive users would use the
>> store that they have and new Iceberg users would implement custom?
>>
>> Appreciate help in understanding,
>>
>> Kris
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Hive Metastore integration future

Reply via email to