" It would be simply to gain full functionality of Hive" . That should read Iceberg.
On Wed, Jan 29, 2020 at 12:55 PM Kristopher Kane <kk...@etsy.com> wrote: > Adrian, "I'd imagine that keeping binary compatibility across Hive, Spark > and Iceberg will be quite a challenge." Yeah, this is what I'm afraid of > over time. Iceberg's big draw for me is only maintaining a processing > engine (Spark), Iceberg and cloud storage compatibility and any potential > Iceberg use wouldn't even be with the rest of the Hive ecosystem. It would > be simply to gain full functionality of Hive via a ready-to-use metastore > which, right now, defaults to Hive. Hive 3, with Ranger and Atlas and > Ranger based security, take things even further away for Spark as it is not > allowing interaction with Hive intrinsic services like the metastore > anyway. It might be that you can run the Hive 3 metastore for now but the > paths forward don't suggest that is accessible for much into the future. > > Ryan, when you said, "I'd really love to see a new metastore project," did > you mean internal to the Iceberg project? > > Kris > > On Wed, Jan 29, 2020 at 12:17 PM Mass Dosage <massdos...@gmail.com> wrote: > >> On the topic of Hive versions - we've definitely experienced some issues >> trying to programmatically use the iceberg-spark-runtime artifact in unit >> tests (it uses Hive 1.2 as mentioned above). We then tried to also use some >> other common HIve testing libraries like HiveRunner >> <https://github.com/klarna/HiveRunner/> and BeeJU >> <https://github.com/HotelsDotCom/beeju> which in turn use Hive 2.3. We >> then ended up with exceptions (e.g. "Method not found") due to >> incompatibilities between the Hive library classes and had to abandon the >> testing libraries. I can share these exceptions if that would be useful but >> I'd imagine that keeping binary compatibility across Hive, Spark and >> Iceberg will be quite a challenge. I'd prefer Iceberg defaulting to Hive >> 2.3.x over 1.2 as 1.2 is pretty old, I don't think any of the commercial >> Hadoop vendors officially support it any more and I think it's used a lot >> less now than 2.x but I could be wrong. Alternatively a way to pick and >> choose a Hive version would be great but probably quite a bit of work to >> pull off... >> >> Adrian >> >> On Wed, 29 Jan 2020 at 16:59, Ryan Blue <rb...@netflix.com.invalid> >> wrote: >> >>> Hi Kris, >>> >>> We use version 1.2.1 because the part that we're using hasn't changed >>> much and we want to ensure compatibility with old metastore versions. >>> Iceberg should work with newer metastores, and feel free to open a bug if >>> you find a problem with one. We'll make sure to fix it to be compatible >>> with a range of versions. >>> >>> I'm not sure what people are going to want eventually. Right now, we >>> know that many people use the Hive metastore to track tables, so it makes >>> sense to support it as an option. Iceberg allows you to plug in your own >>> metastore easily because we know that lots of places (Netflix included) >>> have their own metastore implementations. I'd really love to see a new >>> metastore project, but I don't think that Iceberg should be opinionated >>> about which one you use. >>> >>> rb >>> >>> On Wed, Jan 29, 2020 at 7:32 AM Kristopher Kane <kk...@etsy.com.invalid> >>> wrote: >>> >>>> Hi Iceberg. >>>> >>>> It looks like for most cases where non-atomic rename is required, using >>>> the Hive metastore is the baseline with the ability to implement a custom. >>>> >>>> I couldn't find mailing list history or GitHub issue that suggests that >>>> Iceberg will implement its own. Is that intended for the future? >>>> >>>> I ask because Iceberg's metastore version pin is 1.2.1 which is very >>>> old. Someone using Iceberg, with a Hive metastore, mind find difficult >>>> moving maintaining peace in upgrades with Hive. >>>> >>>> Related: Is the intention here that existing Hive users would use the >>>> store that they have and new Iceberg users would implement custom? >>>> >>>> Appreciate help in understanding, >>>> >>>> Kris >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >>