I think it would be nice to separate what client API users need from the the provided dependencies issue. It seems like whatever module client projects depend on should itself only have dependencies on things that it actually needs. If it doesn't need hadoop, then it shouldn't declare it as a dependency at all. The hadoop-dependent server and the hadoop-independent client interface both need to share intermediate objects, but it seems like those could be defined in another, common hadoop-independent module.
In/Outputformats are an exception, but I agree they would be best separated into their own hadoop-dependent module (which might itself depend on the client module). As far as the provided question goes, it seems to me that the only reason to mark a dep provided is if we think developers will *usually* want to compile against different versions. Initially I thought it would make sense if we thought the runtime versions would vary, but Chris makes a good point that the deps we include in the distributed package can be selected independently of the maven dep scope. Since you can build accumulo against any version of hadoop and it will still run against any other version of hadoop, I think it's better to make things easier on us by having it compile scoped. If someone depends on the accumulo server, then they may have to exclude the transitive dependency if our hadoop is polluting theirs, but I think that issue can be mitigated by not requiring client apps to depend on the entire server. On Wed, Nov 6, 2013 at 5:17 PM, Joey Echeverria <[email protected]>wrote: > Do Accumulo users need Hadoop or it's dependencies in order to use the > client APIs? > > The only client API that I could see needing it would be the > [In|Out]putFormats, but it'd be cool if that was a separate module and > that module had the appropriate Hadoop dependencies with the compile > scope. > > -Joey > > On Wed, Nov 6, 2013 at 5:05 PM, Christopher <[email protected]> wrote: > > What's the latest opinion whether things should be marked "provided" in > the pom? > > I've changed my mind on this a few times, myself, so I'm curious what > > others think. > > > > The provided scope means that it will not propagate as a transitive > > dependency. Other than that, it doesn't do much... though we can > > control packaging based on provided or not. > > > > I'm not sure this gets us much, and it's inconvenient for users. We > > can control packaging in other ways (like being more explicit and > > carefully considering which dependencies we include in an RPM or > > tarball, for instance). > > > > If we drop its declaration, what this means, is that if users want to > > build with Accumulo as a dependency, but against a different version > > of Hadoop than what we declare in our POM, they'll have to explicitly > > <exclude> the hadoop dependencies, and redeclare them, or they will > > have to use their <dependencyManagement> section to force a particular > > dependency of hadoop. > > > > The advantage to users, though, if we drop this, is that they won't > > have to constantly re-declare transitive dependencies to get their > > projects to build/test/run. > > > > See http://s.apache.org/maven-dependency-scopes > > > > Thoughts? > > > > -- > > Christopher L Tubbs II > > http://gravatar.com/ctubbsii >
