[ 
https://issues.apache.org/jira/browse/AVRO-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859659#action_12859659
 ] 

Scott Carey commented on AVRO-512:
----------------------------------

I agree that Avro should not require MapReduce -- specifically the maven POM 
should not cause consumers to pull MapReduce by default.

But, that is what happens.  The POM generated by the build specifies 
hadoop-core as "optional" meaning downstream projects that consume Avro won't 
automatically pull the Hadoop jar.  Another option for similar effect is to 
specify the dependency scope as "provided" instead of "compile" which makes the 
jar available for build and test but does not bundle it.  This is probably 
preferred for MapReduce.  If a user wants to use those APIs, they have to get a 
copy of their own hadoop-core jar or specify the dependency themselves.

Putting the code in Hadoop is probably a problem, unless we want to release new 
versions of 0.18, 0.19, 0.20,  etc.  Placing it in Hadoop means that changes to 
the Avro lower level APIs will break compatibility with the version in Hadoop.  
Honestly, some of those APIs are going to keep evolving and dot-releases of 
AVRO can break these APIs (but not encoded formats).  Until these APIs are more 
locked down it is better to keep packages like this in the Avro project.

-----------
Going slightly off topic now:

A few other libraries Avro bundles have similar issues -- optional side 
features should specify either "provided" or "optional" flags in the maven pom. 
  Or, the project needs to be split up into a few jars.

avro-core
->  avro-genavro
->  avro-protocol
->  avro-mapred
->  avro-reflect

probably covers the main dependency chunks.  Avro-core can get away with only 
jackson, slf4j, and commons-lang, I think -- meaning generic, and specific 
APIs, file formats, etc work.


> define and implement mapreduce connector protocol
> -------------------------------------------------
>
>                 Key: AVRO-512
>                 URL: https://issues.apache.org/jira/browse/AVRO-512
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>
> Avro should provide Hadoop Mapper and Reducer implementations that connect to 
> a subprocess in another programming language, transmitting raw binary values 
> to and from that process.  This should be modeled after Hadoop Pipes.  It 
> would allow one to easily write efficient mapreduce programs in non-Java 
> languages that process Avro-format data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to