[jira] [Closed] (SPARK-13246) Avro 1.7.7 Schema.parse race condition hangs task

koert kuipers (JIRA) Fri, 01 Apr 2016 16:36:40 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


koert kuipers closed SPARK-13246.
---------------------------------
    Resolution: Workaround

Workaround is build spark with hadoop included (not provided).

> Avro 1.7.7 Schema.parse race condition hangs task
> -------------------------------------------------
>
>                 Key: SPARK-13246
>                 URL: https://issues.apache.org/jira/browse/SPARK-13246
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0
>         Environment: spark 1.6.0 with yarn and hadoop provided running on cdh 
> 5.5
>            Reporter: koert kuipers
>
> I noticed that a job reading avro files would have some tasks that never 
> finish. Looking at the threads they got stuck in:
> java.util.HashMap.removeEntryForKey(HashMap.java:690)
> java.util.HashMap.remove(HashMap.java:656)
> org.apache.avro.util.WeakIdentityHashMap.reap(WeakIdentityHashMap.java:140)
> org.apache.avro.util.WeakIdentityHashMap.containsKey(WeakIdentityHashMap.java:58)
> org.apache.avro.LogicalTypes.fromSchemaIgnoreInvalid(LogicalTypes.java:55)
> org.apache.avro.Schema.parse(Schema.java:1318)
> org.apache.avro.Schema.parse(Schema.java:1260)
> org.apache.avro.Schema$Parser.parse(Schema.java:1024)
> org.apache.avro.Schema$Parser.parse(Schema.java:1012)
> org.apache.avro.Schema.parse(Schema.java:1064)
> org.apache.avro.mapred.AvroJob.getInputSchema(AvroJob.java:73)
> org.apache.avro.mapred.AvroRecordReader.<init>(AvroRecordReader.java:41)
> org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
> org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
> org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
> org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> The issue is that Schema.parse is not thread safe, and i have multiple tasks 
> calling this method in the same executor.
> See here:
> https://issues.apache.org/jira/browse/AVRO-1773
>  
> I believe this will affect spark-avro as well, although i have not tried it 
> yet.
> For me this behavior showed up when upgrading to spark 1.6.0 from 1.5.1, i am 
> not sure why it did not manifest itself in spark 1.5.1.
> Since i cannot reliably override the avro version that is shipped with spark 
> from my program (or can i? i tried with older spark and it failed) this means 
> i currently cannot use the avro format except when using only 1 core per 
> executor in yarn.
> I believe the fix is to upgrade to avro 1.8.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Closed] (SPARK-13246) Avro 1.7.7 Schema.parse race condition hangs task

Reply via email to