Shipping the connectors with the job jars would thin out the dependencies, but make it more cumbersome to assemble a job jar.
On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[email protected]> wrote: > Thanks, I will look into this and try to figure it out, as you can see I > am not a maven pro :) > > On 29 Sep 2014, at 18:44, Stephan Ewen <[email protected]> wrote: > > > You may be able to solve this with careful exclusions. > > > > It seems kafka is monolithic, having no separation between connector and > > engine. If you know for example that zookeeper is not required by the > > connector (you have to be sure), you can exclude it as the dependency. We > > have done this for Hadoop1, where we only use the HDFS client > functionality. > > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[email protected]> > wrote: > > > >> Yes, you are right, kafka and flume are the heavy ones. > >> > >> We always have the choice to take out them from the package and maybe > have > >> a separate repo for all the different connectors and only keep 1-2 most > >> important ones. I don't think there's much else to do because we don't > use > >> the packages you mentioned, but they get pulled by the kafka and flume > >> dependencies. > >> > >> > >> > >> > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[email protected]> wrote: > >> > >>> The streaming connectors currently pull a massive amount of > dependencies. > >>> > >>> For example, we transitively get the scala compiler/reflection/etc and > >>> ZooKeeper. > >>> > >>> A lot of stuff comes with flume and kafka. Are those required to make > the > >>> connectors work? Otherwise, it might be good to exclude them, to > prevent > >>> conflicts for users that actually depend on those components. > >>> > >> > >
