This error is likely due to EMR including some Hadoop lib dirs in
spark.{driver,executor}.extraClassPath. (Hadoop bundles an older version of
Avro than what Spark uses, so you are probably getting bitten by this Avro
mismatch.)

We determined that these Hadoop dirs are not actually necessary to include
in the Spark classpath and in fact seem to be *causing* several problems
such as this one, so we have removed these directories from the
extraClassPath settings for the next EMR release.

For now, you may do the same yourself by using a configuration like the
following when creating your cluster:

[
  {
    "classification":"spark-defaults",
    "properties": {
      "spark.executor.extraClassPath":
"/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*",
      "spark.driver.extraClassPath":
"/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*"
    }
  }
]

(For reference, the removed dirs are /usr/lib/hadoop/*,
/usr/lib/hadoop-hdfs/* and /usr/lib/hadoop-yarn/*.)

Hope this helps!
~ Jonathan

On Wed, Feb 24, 2016 at 1:14 PM <ross.cramb...@thomsonreuters.com> wrote:

> Hadoop 2.6.0 included?
> spark-assembly-1.5.2-hadoop2.6.0.jar
>
> On Feb 24, 2016, at 4:08 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> does your spark version come with batteries (hadoop included) or is it
> build with hadoop provided and you are adding hadoop binaries to classpath
>
> On Wed, Feb 24, 2016 at 3:08 PM, <ross.cramb...@thomsonreuters.com> wrote:
>
>> I’m trying to save a data frame in Avro format but am getting the
>> following error:
>>
>>
>> java.lang.NoSuchMethodError: 
>> org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
>>
>> I found the following workaround
>> https://github.com/databricks/spark-avro/issues/91
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Davro_issues_91&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=6PPI9KfAhYd00YlGE-i1UOpWXH5wXl-sbvA9ru97_Q0&s=Cob1Er8hdIoBCA16Da16bHbcJJMQPgCY_XEvuj4ZcZs&e=>
>>  -
>> which seems to say that this is from a mismatch in Avro versions. I have
>> tried following both solutions detailed to no avail:
>>  - Manually downloading avro-1.7.7.jar and including it in
>> /usr/lib/hadoop-mapreduce/
>>  - Adding avro-1.7.7.jar to spark.driver.extraClassPath and
>> spark.executor.extraClassPath
>>  - The same with avro-1.6.6
>>
>> I am still getting the same error, and now I am just stabbing in the
>> dark. Anyone else still running into this issue?
>>
>>
>> I am using Pyspark 1.5.2 on EMR.
>>
>
>
>

Reply via email to