[jira] [Created] (SPARK-13246) Avro 1.7.7 Schema.parse race condition hangs task

koert kuipers (JIRA) Tue, 09 Feb 2016 10:52:37 -0800

koert kuipers created SPARK-13246:
-------------------------------------

             Summary: Avro 1.7.7 Schema.parse race condition hangs task
                 Key: SPARK-13246
                 URL: https://issues.apache.org/jira/browse/SPARK-13246
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.0
         Environment: spark 1.6.0 with yarn and hadoop provided running on cdh 
5.5
            Reporter: koert kuipers



I noticed that a job reading avro files would have some tasks that never 
finish. Looking at the threads they got stuck in:
java.util.HashMap.removeEntryForKey(HashMap.java:690)
java.util.HashMap.remove(HashMap.java:656)
org.apache.avro.util.WeakIdentityHashMap.reap(WeakIdentityHashMap.java:140)
org.apache.avro.util.WeakIdentityHashMap.containsKey(WeakIdentityHashMap.java:58)
org.apache.avro.LogicalTypes.fromSchemaIgnoreInvalid(LogicalTypes.java:55)
org.apache.avro.Schema.parse(Schema.java:1318)
org.apache.avro.Schema.parse(Schema.java:1260)
org.apache.avro.Schema$Parser.parse(Schema.java:1024)
org.apache.avro.Schema$Parser.parse(Schema.java:1012)
org.apache.avro.Schema.parse(Schema.java:1064)
org.apache.avro.mapred.AvroJob.getInputSchema(AvroJob.java:73)
org.apache.avro.mapred.AvroRecordReader.<init>(AvroRecordReader.java:41)
org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

The issue is that Schema.parse is not thread safe, and i have multiple tasks 
calling this method in the same executor.
See here:
https://issues.apache.org/jira/browse/AVRO-1773
 
I believe this will affect spark-avro as well, although i have not tried it yet.
For me this behavior showed up when upgrading to spark 1.6.0 from 1.5.1, i am 
not sure why it did not manifest itself in spark 1.5.1.

Since i cannot reliably override the avro version that is shipped with spark 
from my program (or can i? i tried with older spark and it failed) this means i 
currently cannot use the avro format except when using only 1 core per executor 
in yarn.

I believe the fix is to upgrade to avro 1.8.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-13246) Avro 1.7.7 Schema.parse race condition hangs task

Reply via email to