[
https://issues.apache.org/jira/browse/SPARK-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
koert kuipers closed SPARK-13246.
---------------------------------
Resolution: Workaround
Workaround is build spark with hadoop included (not provided).
> Avro 1.7.7 Schema.parse race condition hangs task
> -------------------------------------------------
>
> Key: SPARK-13246
> URL: https://issues.apache.org/jira/browse/SPARK-13246
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.0
> Environment: spark 1.6.0 with yarn and hadoop provided running on cdh
> 5.5
> Reporter: koert kuipers
>
> I noticed that a job reading avro files would have some tasks that never
> finish. Looking at the threads they got stuck in:
> java.util.HashMap.removeEntryForKey(HashMap.java:690)
> java.util.HashMap.remove(HashMap.java:656)
> org.apache.avro.util.WeakIdentityHashMap.reap(WeakIdentityHashMap.java:140)
> org.apache.avro.util.WeakIdentityHashMap.containsKey(WeakIdentityHashMap.java:58)
> org.apache.avro.LogicalTypes.fromSchemaIgnoreInvalid(LogicalTypes.java:55)
> org.apache.avro.Schema.parse(Schema.java:1318)
> org.apache.avro.Schema.parse(Schema.java:1260)
> org.apache.avro.Schema$Parser.parse(Schema.java:1024)
> org.apache.avro.Schema$Parser.parse(Schema.java:1012)
> org.apache.avro.Schema.parse(Schema.java:1064)
> org.apache.avro.mapred.AvroJob.getInputSchema(AvroJob.java:73)
> org.apache.avro.mapred.AvroRecordReader.<init>(AvroRecordReader.java:41)
> org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
> org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
> org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
> org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> The issue is that Schema.parse is not thread safe, and i have multiple tasks
> calling this method in the same executor.
> See here:
> https://issues.apache.org/jira/browse/AVRO-1773
>
> I believe this will affect spark-avro as well, although i have not tried it
> yet.
> For me this behavior showed up when upgrading to spark 1.6.0 from 1.5.1, i am
> not sure why it did not manifest itself in spark 1.5.1.
> Since i cannot reliably override the avro version that is shipped with spark
> from my program (or can i? i tried with older spark and it failed) this means
> i currently cannot use the avro format except when using only 1 core per
> executor in yarn.
> I believe the fix is to upgrade to avro 1.8.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]