[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Scott Carey (JIRA) Wed, 01 Sep 2010 11:14:16 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905124#action_12905124
 ]


Scott Carey commented on AVRO-647:
----------------------------------

bq. Finally, to be clear, is there a motive for this beyond better expressing 
dependencies? Functionally sticking everything in a single jar with lots of 
optional dependencies works fine, but folks then have to guess which 
dependencies they actually need, and that's the primary problem this seeks to 
solve. Is that right, or are there other problems too?

That is the main case here.  Dependendies become more explicit.  Users should 
be able to consume the parts they need without too much accidental baggage.  
Instead, we could simply document this all clearly so that users are armed with 
the information necessary to configure their builds to exclude transitive 
dependencies they don't use.

However, Avro is by nature something that many things will depend on, and many 
of those things portions of Avro might itself depend on.  In particular, making 
it easy to avoid circular dependencies is a plus.  As we have seen 
(https://issues.apache.org/jira/browse/AVRO-545) , even if it is possible to 
use ivy/maven features to prevent circular dependency, it makes users uneasy.

The guidelines I use for my projects is two-fold:
* If the cascaded set of dependencies is large and likely to conflict with 
other things, it should be easy to separate (for Avro, this is the hadoop 
dependency).
* If the dependency is physically large (large jar file), consider making it 
easy to separate.
* If the dependency is for a minor rarely used feature, be careful.  For 
example Jackson 1.0.1 being used by hadoop 0.20+ for dumping configuration 
files to JSON causes problems.

So for the case of Reflect, if paranamer doesn't have a lot of cascaded 
dependencies itself, nor is a large jar on its own, then including it in 
avro-data is not going to be a big deal.  

bq. If we separate jars, it might be good to split the build-time classpath in 
the same manner, by splitting the src tree. 

We have three choices, I think:
1.  Leave the source tree as-is, and have the build use ant file 
excludes/includes to define what is packaged in each one.   Managing the 
excludes/includes will be troublesome and would be easier if the split was 
cleanly done by package.  Not much else would have to change -- the compile and 
test phases would stay the same.  There would also be the downside that tests 
would not implicitly test the packaging boundaries.
2.  Break it into different source trees and continue using ant/ivy.  This is 
more work and means we would be breaking up tests and compile phases too.
3.  Break it into different source trees and use maven.  Maven is a natural fit 
for this sort of thing and I'm experienced with it, but it is not trivial and 
others here aren't as familiar with it.  To wire up IDL and the Specific 
compiler,  Maven plugins would be required.  Interop testing would probably 
still require ant. 


> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies 
> and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a 
> library.  This excludes the specific compiler, avro idl, and other build-time 
> or development tools, as well as avro packages for third party integration 
> such as hadoop.  This jar should then have a minimal set of dependencies 
> (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications 
> will not need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to 
> that.  This makes it easier for pig/hive/hadoop to consume avro-core without 
> circular dependencies. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

Reply via email to