Thanks all for explaining. I misunderstood the original proposal. -1 to put them in our distributions +1 to have provide hive uber jars as Seth and Aljoscha advice
Hive is just a connector no matter how important it is. So I totally agree that we shouldn't put them in our distributions. We can start offering three uber jars: - flink-sql-connector-hive-1 (uber jar with hive dependent version 1.2.1) - flink-sql-connector-hive-2 (uber jar with hive dependent version 2.3.4) - flink-sql-connector-hive-3 (uber jar with hive dependent version 3.1.1) My understanding is quite enough to users. Best, Jingsong Lee On Sun, Dec 15, 2019 at 12:42 PM Jark Wu <imj...@gmail.com> wrote: > I agree with Seth and Aljoscha and think that is a right way to go. > We already provided uber jars for kafka and elasticsearch for out-of-box, > you can see the download links in this page[1]. > Users can easily to download the connectors and versions they like and drag > to SQL CLI lib directories. The uber jars > contains all the dependencies required and may be shaded. In this way, > users can skip to build a uber jar themselves. > Hive is indeed a "connector" too, and should also follow this way. > > Best, > Jark > > [1]: > > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies > > On Sat, 14 Dec 2019 at 03:03, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > I was going to suggest the same thing as Seth. So yes, I’m against having > > Flink distributions that contain Hive but for convenience downloads as we > > have for Hadoop. > > > > Best, > > Aljoscha > > > > > On 13. Dec 2019, at 18:04, Seth Wiesman <sjwies...@gmail.com> wrote: > > > > > > I'm also -1 on separate builds. > > > > > > What about publishing convenience jars that contain the dependencies > for > > > each version? For example, there could be a flink-hive-1.2.1-uber.jar > > that > > > users could just add to their lib folder that contains all the > necessary > > > dependencies to connect to that hive version. > > > > > > > > > On Fri, Dec 13, 2019 at 8:50 AM Robert Metzger <rmetz...@apache.org> > > wrote: > > > > > >> I'm generally not opposed to convenience binaries, if a huge number of > > >> people would benefit from them, and the overhead for the Flink project > > is > > >> low. I did not see a huge demand for such binaries yet (neither for > the > > >> Flink + Hive integration). Looking at Apache Spark, they are also only > > >> offering convenience binaries for Hadoop only. > > >> > > >> Maybe we could provide a "Docker Playground" for Flink + Hive in the > > >> documentation (and the flink-playgrounds.git repo)? > > >> (similar to > > >> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink-operations-playground.html > > >> ) > > >> > > >> > > >> > > >> On Fri, Dec 13, 2019 at 3:04 PM Chesnay Schepler <ches...@apache.org> > > >> wrote: > > >> > > >>> -1 > > >>> > > >>> We shouldn't need to deploy additional binaries to have a feature be > > >>> remotely usable. > > >>> This usually points to something else being done incorrectly. > > >>> > > >>> If it is indeed such a hassle to setup hive on Flink, then my > > conclusion > > >>> would be that either > > >>> a) the documentation needs to be improved > > >>> b) the architecture needs to be improved > > >>> or, if all else fails c) provide a utility script for setting it up > > >> easier. > > >>> > > >>> We spent a lot of time on reducing the number of binaries in the > hadoop > > >>> days, and also go extra steps to prevent a separate Java 11 binary, > and > > >>> I see no reason why Hive should get special treatment on this matter. > > >>> > > >>> Regards, > > >>> Chesnay > > >>> > > >>> On 13/12/2019 09:44, Bowen Li wrote: > > >>>> Hi all, > > >>>> > > >>>> I want to propose to have a couple separate Flink distributions with > > >> Hive > > >>>> dependencies on specific Hive versions (2.3.4 and 1.2.1). The > > >>> distributions > > >>>> will be provided to users on Flink download page [1]. > > >>>> > > >>>> A few reasons to do this: > > >>>> > > >>>> 1) Flink-Hive integration is important to many many Flink and Hive > > >> users > > >>> in > > >>>> two dimensions: > > >>>> a) for Flink metadata: HiveCatalog is the only persistent > catalog > > >>> to > > >>>> manage Flink tables. With Flink 1.10 supporting more DDL, the > > >> persistent > > >>>> catalog would be playing even more critical role in users' workflow > > >>>> b) for Flink data: Hive data connector (source/sink) helps both > > >>> Flink > > >>>> and Hive users to unlock new use cases in streaming, > > >>> near-realtime/realtime > > >>>> data warehouse, backfill, etc. > > >>>> > > >>>> 2) currently users have to go thru a *really* tedious process to get > > >>>> started, because it requires lots of extra jars (see [2]) that are > > >> absent > > >>>> in Flink's lean distribution. We've had so many users from public > > >> mailing > > >>>> list, private email, DingTalk groups who got frustrated on spending > > >> lots > > >>> of > > >>>> time figuring out the jars themselves. They would rather have a more > > >>> "right > > >>>> out of box" quickstart experience, and play with the catalog and > > >>>> source/sink without hassle. > > >>>> > > >>>> 3) it's easier for users to replace those Hive dependencies for > their > > >> own > > >>>> Hive versions - just replace those jars with the right versions and > no > > >>> need > > >>>> to find the doc. > > >>>> > > >>>> * Hive 2.3.4 and 1.2.1 are two versions that represent lots of user > > >> base > > >>>> out there, and that's why we are using them as examples for > > >> dependencies > > >>> in > > >>>> [1] even though we've supported almost all Hive versions [3] now. > > >>>> > > >>>> I want to hear what the community think about this, and how to > achieve > > >> it > > >>>> if we believe that's the way to go. > > >>>> > > >>>> Cheers, > > >>>> Bowen > > >>>> > > >>>> [1] https://flink.apache.org/downloads.html > > >>>> [2] > > >>>> > > >>> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies > > >>>> [3] > > >>>> > > >>> > > >> > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions > > >>>> > > >>> > > >>> > > >> > > > > > -- Best, Jingsong Lee