[
https://issues.apache.org/jira/browse/AVRO-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306898#comment-15306898
]
Mikko Kupsu commented on AVRO-1855:
-----------------------------------
[~rdblue], I have to admit that I don't have a slightest glue. I only have
guesses.
I know that the values in field _headers_ are *definitively* Strings since I'm
writing storing then with a class which is created from Avro schema and it has
above field schema defined. Also I'm even converting them specifically to
Strings while reading them as _Map_ from a _GenericRecord_.
{code:java}
setHeaders((Map<CharSequence, CharSequence>) record.get("headers"))
{code}
I think something gets he type of the Map wrong since when I'm seeing this
there is *always* a key-value pair where the value can interpreted as a Number.
In the example above it's _response_status_code_. Although, this is not always
the case and some datums go thru OK even though having similar fields.
I don't know how Avro deducts the type of a Map but could the first value it
gets, have an impact on the resulting type?
> Avro-mapred not evaluating map schema correctly when values are expected to
> be strings
> --------------------------------------------------------------------------------------
>
> Key: AVRO-1855
> URL: https://issues.apache.org/jira/browse/AVRO-1855
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.8.0
> Reporter: Mikko Kupsu
> Priority: Critical
> Attachments: 20160530_AVRO-1855.patch
>
>
> When reading bunch of Avro file and concatenating them using avro-mapred,
> there is an issue with following schema definition line:
> {code}
> {"name": "headers", "type": ["null", {"type": "map", "values": "string"}]},
> {code}
> Below exceptions are thrown:
> {code}
> Caused by: org.apache.avro.UnresolvedUnionException: Not in union
> ["null",{"type":"map","values":"string"}]: {range=bytes=91553252-91557347,
> accept=*/*, response_status_code=206, host=108.175.39.172}
> at
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:709)
> at
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:192)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:110)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:150)
> at
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
> at
> org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:182)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:150)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
> {code}
> I've fixed this in my own [GitHub
> fork|https://github.com/mikkokupsu/avro/tree/hotfix/20160530/avro-schema-map-string-problem]
> and I've attached the patch too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)