[
https://issues.apache.org/jira/browse/AVRO-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296148#comment-13296148
]
Jacob Metcalf commented on AVRO-1103:
-------------------------------------
Doug, thanks for doing this. Am based in the UK so tested this morning. There
were some issues that meant I had to extend the patch a bit and I am still left
with two problems that I have not solved.
The first task I had was to make the changes to compile avro-mapred against
Hadoop 2. These are relatively easy as the incompatiblity is confined to
TaskAttemptContext & SequenceFileBase.
Secondly your patched AvroSerialization whilst it uses the config object is
still trying to use the parent ClassLoader so cannot find my Avro class. I have
included my attempt to address this in the patch. Basically I locate the
classloader using:
bq. Class.forName( schema.getFullName()).getClassLoader()
With some additional logic to support UNIONs. I am sure it could be a lot more
elegant/efficient but it worked.
This got me a long way forward. Now Hadoop 2 / Avro 1.7 are able to deserialize
my Avro Specific classes in the shuffle. However I am still left with two
corollary classloader problems:
2) Doing a deepCopy via <MyClass>.newBuilder( <myObject> ).build(). Here the
problem boils down to another use of SpecificData.get() this time in
SpecificRecordBuilderBase.java:
{quote}
protected SpecificRecordBuilderBase(Schema schema) {{
super(schema, SpecificData.get());
}}
{quote}
3) Using AvroKeyValueInputFormat to deserialize from a file. This is another
case of needing to pass a SpecificData object to a reader, this time in
AvroRecordReaderBase.java:90:
{quote}
// Wrap the seekable input stream in an Avro DataFileReader.
mAvroFileReader = createAvroFileReader(seekableFileInput,
new ReflectDatumReader<T>(mReaderSchema));
{quote}
In terms of taking this forward - Its a big ask but it would help me immensely
if a version of avro-mapred 1.7 for Hadoop 2 could be made available. For
example the MRUnit team have come up with a way of distributing both Hadoop 1
and 2 versions with users selecting using a <classifier>hadoop2</classifier>.
Then, if you are happy with my fix for the first problem, I can help come up
with solutions/test fixes for problems 2 & 3. Happy to raise additional JIRAs
for all these points.
> New AvroDeserializer should Locate Appropriate Classloader
> ----------------------------------------------------------
>
> Key: AVRO-1103
> URL: https://issues.apache.org/jira/browse/AVRO-1103
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.0
> Environment: Hadoop 0.23.1 with Avro jars replaced by 1.7 jars
> Specific data classes assembled into JAR with mapper/reducer
> Reporter: Jacob Metcalf
> Assignee: Doug Cutting
> Fix For: 1.7.1
>
> Attachments: AVRO-1103-for 0.23.1.patch, AVRO-1103.patch,
> AVRO-1103.patch, AVRO-1103.patch
>
>
> Continuing on from AVRO-873 I believe some more work needs to be done to get
> the MapReduce 2 APIs in Avro 1.7 working with Hadoop 0.23. Since it revolves
> around classloaders it is complex to present a unit test which fails so I
> will explain the problem:
> - By default SpecificDatumReader will use the classloader it was loaded from
> to find a Specific class to deserialize into.
> - In earlier versions of Hadoop e.g. 0.20.2 Avro was not included so
> typically you would bundle Avro into your job jar along with the Specific
> classes so they would be on the same classpath.
>
> - However later versions of Hadoop such as 0.23 ship with Avro. Thus you find
> that the SpecificData.class.getClassloader() is typically a parent loader
> which just contains Hadoop components.
> - Thus when SpecificData goes to construct a Specific class from the schema
> it cannot locate it and silently defaults to creating a GenericData.
> In AVRO-873 an additional constructor was added to SpecificData to force it
> to use a different classloader. Thus to extend this fix to the new MR2 APIs:
> - AvroDeserializer could attempt to instantiate the class using
> Class.forName() and from this get the appropriate Classloader and pass this
> into the constructor of SpecificDatumReader.
> - Line 2771 of SpecificData.java is:
> bq. Class c = SpecificData.get().getClass(schema);
> - This would need to be changed to:
> bq. Class c = this.getClass(schema);
> I have raised this in the mail groups here:
> http://search-hadoop.com/m/wVUf1aLCwd/classloader/v=threaded so apologies if
> this is already being thought about.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira