koert kuipers created SPARK-13246:
-------------------------------------
Summary: Avro 1.7.7 Schema.parse race condition hangs task
Key: SPARK-13246
URL: https://issues.apache.org/jira/browse/SPARK-13246
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.6.0
Environment: spark 1.6.0 with yarn and hadoop provided running on cdh
5.5
Reporter: koert kuipers
I noticed that a job reading avro files would have some tasks that never
finish. Looking at the threads they got stuck in:
java.util.HashMap.removeEntryForKey(HashMap.java:690)
java.util.HashMap.remove(HashMap.java:656)
org.apache.avro.util.WeakIdentityHashMap.reap(WeakIdentityHashMap.java:140)
org.apache.avro.util.WeakIdentityHashMap.containsKey(WeakIdentityHashMap.java:58)
org.apache.avro.LogicalTypes.fromSchemaIgnoreInvalid(LogicalTypes.java:55)
org.apache.avro.Schema.parse(Schema.java:1318)
org.apache.avro.Schema.parse(Schema.java:1260)
org.apache.avro.Schema$Parser.parse(Schema.java:1024)
org.apache.avro.Schema$Parser.parse(Schema.java:1012)
org.apache.avro.Schema.parse(Schema.java:1064)
org.apache.avro.mapred.AvroJob.getInputSchema(AvroJob.java:73)
org.apache.avro.mapred.AvroRecordReader.<init>(AvroRecordReader.java:41)
org.apache.avro.mapred.AvroInputFormat.getRecordReader(AvroInputFormat.java:71)
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
The issue is that Schema.parse is not thread safe, and i have multiple tasks
calling this method in the same executor.
See here:
https://issues.apache.org/jira/browse/AVRO-1773
I believe this will affect spark-avro as well, although i have not tried it yet.
For me this behavior showed up when upgrading to spark 1.6.0 from 1.5.1, i am
not sure why it did not manifest itself in spark 1.5.1.
Since i cannot reliably override the avro version that is shipped with spark
from my program (or can i? i tried with older spark and it failed) this means i
currently cannot use the avro format except when using only 1 core per executor
in yarn.
I believe the fix is to upgrade to avro 1.8.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]