You may be able to solve this with careful exclusions. It seems kafka is monolithic, having no separation between connector and engine. If you know for example that zookeeper is not required by the connector (you have to be sure), you can exclude it as the dependency. We have done this for Hadoop1, where we only use the HDFS client functionality.
On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[email protected]> wrote: > Yes, you are right, kafka and flume are the heavy ones. > > We always have the choice to take out them from the package and maybe have > a separate repo for all the different connectors and only keep 1-2 most > important ones. I don't think there's much else to do because we don't use > the packages you mentioned, but they get pulled by the kafka and flume > dependencies. > > > > > On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[email protected]> wrote: > > > The streaming connectors currently pull a massive amount of dependencies. > > > > For example, we transitively get the scala compiler/reflection/etc and > > ZooKeeper. > > > > A lot of stuff comes with flume and kafka. Are those required to make the > > connectors work? Otherwise, it might be good to exclude them, to prevent > > conflicts for users that actually depend on those components. > > >
