BELUGA BEHR created HIVE-18956:
----------------------------------
Summary: AvroSerDe Race Condition
Key: HIVE-18956
URL: https://issues.apache.org/jira/browse/HIVE-18956
Project: Hive
Issue Type: Bug
Components: Serializers/Deserializers
Affects Versions: 2.3.2, 3.0.0
Reporter: BELUGA BEHR
{code}
@Override
public Writable serialize(Object o, ObjectInspector objectInspector) throws
SerDeException {
if(badSchema) {
throw new BadSchemaException();
}
return getSerializer().serialize(o, objectInspector, columnNames,
columnTypes, schema);
}
@Override
public Object deserialize(Writable writable) throws SerDeException {
if(badSchema) {
throw new BadSchemaException();
}
return getDeserializer().deserialize(columnNames, columnTypes, writable,
schema);
}
...
private AvroDeserializer getDeserializer() {
if(avroDeserializer == null) {
avroDeserializer = new AvroDeserializer();
}
return avroDeserializer;
}
private AvroSerializer getSerializer() {
if(avroSerializer == null) {
avroSerializer = new AvroSerializer();
}
return avroSerializer;
}
{code}
{{getDeserializer}} and {{getSerializer}} methods are not thread safe, so
neither are {{deserialize}} and {{serialize}} methods. It probably didn't
matter with MapReduce, but now that we have Spark/Tez, it may be an issue.
You could visualize a scenario where three threads all enter {{getSerializer}}
and all see that {{avroSerializer}} is _null_ and create three instances, then
they would fight to assign the new object to the {{avroSerializer}} variable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)