Sean, thanks for the corner cases you listed. They make a lot of sense. Now I do incline to have Hive 2.3 as the default version.
Dongjoon, apologize if I didn't make it clear before. What made me concerned initially was only the following part: > can we remove the usage of forked `hive` in Apache Spark 3.0 completely officially? So having Hive 2.3 as the default Hive version and adding a `hive-1.2` profile to keep the Hive 1.2.1 fork looks like a feasible approach to me. Thanks for starting the discussion! On Wed, Nov 20, 2019 at 9:46 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Yes. Right. That's the situation we are hitting and the result I expected. > We need to change our default with Hive 2 in the POM. > > Dongjoon. > > > On Wed, Nov 20, 2019 at 5:20 AM Sean Owen <sro...@gmail.com> wrote: > >> Yes, good point. A user would get whatever the POM says without >> profiles enabled so it matters. >> >> Playing it out, an app _should_ compile with the Spark dependency >> marked 'provided'. In that case the app that is spark-submit-ted is >> agnostic to the Hive dependency as the only one that matters is what's >> on the cluster. Right? we don't leak through the Hive API in the Spark >> API. And yes it's then up to the cluster to provide whatever version >> it wants. Vendors will have made a specific version choice when >> building their distro one way or the other. >> >> If you run a Spark cluster yourself, you're using the binary distro, >> and we're already talking about also publishing a binary distro with >> this variation, so that's not the issue. >> >> The corner cases where it might matter are: >> >> - I unintentionally package Spark in the app and by default pull in >> Hive 2 when I will deploy against Hive 1. But that's user error, and >> causes other problems >> - I run tests locally in my project, which will pull in a default >> version of Hive defined by the POM >> >> Double-checking, is that right? if so it kind of implies it doesn't >> matter. Which is an argument either way about what's the default. I >> too would then prefer defaulting to Hive 2 in the POM. Am I missing >> something about the implication? >> >> (That fork will stay published forever anyway, that's not an issue per >> se.) >> >> On Wed, Nov 20, 2019 at 1:40 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> > Sean, our published POM is pointing and advertising the illegitimate >> Hive 1.2 fork as a compile dependency. >> > Yes. It can be overridden. So, why does Apache Spark need to publish >> like that? >> > If someone want to use that illegitimate Hive 1.2 fork, let them >> override it. We are unable to delete those illegitimate Hive 1.2 fork. >> > Those artifacts will be orphans. >> > >> >