Sandy Ryza created SPARK-1851:
---------------------------------
Summary: Upgrade Avro dependency to 1.7.6 so Spark can read Avro
files
Key: SPARK-1851
URL: https://issues.apache.org/jira/browse/SPARK-1851
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Sandy Ryza
Priority: Critical
I tried to set up a basic example getting a Spark job to read an Avro container
file with Avro specifics. This results in a ClassNotFoundException: can't
convert GenericData.Record to com.cloudera.sparkavro.User.
The reason is:
* When creating records, to decide whether to be specific or generic, Avro
tries to load a class with the name specified in the schema.
* Initially, executors just have the system jars (which include Avro), and load
the app jars dynamically with a URLClassLoader that's set as the context
classloader for the task threads.
* Avro tries to load the generated classes with
SpecificData.class.getClassLoader(), which sidesteps this URLClassLoader and
goes up to the AppClassLoader.
Avro 1.7.6 has a change (AVRO-987) that falls back to the Thread's context
classloader when the SpecificData.class.getClassLoader() fails. I tested with
Avro 1.7.6 and did not observe the problem.
--
This message was sent by Atlassian JIRA
(v6.2#6252)