[ 
https://issues.apache.org/jira/browse/AVRO-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115961#comment-14115961
 ] 

Sachin Goyal commented on AVRO-1562:
------------------------------------

My 2c:
Our decision to include this feature should be based on complexity and 
performance-penalty.
If there is a clean solution to support even a complex feature, it should be 
considered.

Note that the current patch does not claim to be clean or performant and there 
may be scope to improve it further.
But it would be good to understand if its really too complex to support.
If its too complex or making it less performant, then we should not fix.

IMHO, lots of Java designs would become usable with Avro, Hadoop and Hive with 
this fix.
That would be a good incentive to analyze the complexity.

> Add support for types extending Maps/Collections
> ------------------------------------------------
>
>                 Key: AVRO-1562
>                 URL: https://issues.apache.org/jira/browse/AVRO-1562
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.7.6
>            Reporter: Sachin Goyal
>         Attachments: custom_map_and_collections1.patch
>
>
> Consider the following code:
> {code}
> import java.io.ByteArrayOutputStream;
> import java.util.*;
> import org.apache.avro.Schema;
> import org.apache.avro.file.DataFileWriter;
> import org.apache.avro.reflect.ReflectData;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroDerivingMaps
> {
>     public static void main (String [] args) throws Exception
>     {
>         MapDerivedContainer orig = new MapDerivedContainer();
>         ReflectData rdata = ReflectData.AllowNull.get();
>         Schema schema = rdata.getSchema(MapDerivedContainer.class);
>         System.out.println(schema);
>         
>         ReflectDatumWriter<MapDerivedContainer> datumWriter = new 
> ReflectDatumWriter (MapDerivedContainer.class, rdata);
>         DataFileWriter<MapDerivedContainer> fileWriter = new 
> DataFileWriter<MapDerivedContainer> (datumWriter);
>         ByteArrayOutputStream baos = new ByteArrayOutputStream();
>         fileWriter.create(schema, baos);
>         fileWriter.append(orig);
>         fileWriter.close();
>     }
> }
> class MapDerived extends HashMap<String, Integer>
> {
>     Integer a = 1;
>     String b = "b";
> }
> class MapDerivedContainer
> {
>     MapDerived2 map = new MapDerived2();
> }
> class MapDerived2 extends MapDerived
> {
>     String c = "c";
> }
> {code}
> \\
> \\
> It throws the following exception:
> {code:javascript}
> {"type":"record","name":"MapDerivedContainer","namespace":"avro","fields":[{"name":"map","type":["null",{"type":"record","name":"MapDerived2","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}],"default":null}]}
> {code}
> {color:brown}
> Exception in thread "main" 
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> org.apache.avro.UnresolvedUnionException: 
> Caused by: org.apache.avro.UnresolvedUnionException: Not in union 
> ["null",{"type":"record","name":"MapDerived2","namespace":"avro","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}]:
>  {}
>       at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
>       at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
>       at 
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
>       at 
> org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:203)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
>       at 
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
>       at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
>       ... 1 more
> {color}
> \\
> \\
> It appears that ReflectData#createSchema() checks for "type instanceof 
> ParameterizedType" and because of this, it skips handling of the map.
> The same is not true of GenericData#isMap() and GenericData#resolveUnion() 
> fails because of this.
> The same may be true for classes extending ArrayList, Collection, Set etc.
> Also, note the schema for the class extending Map:
> {code:javascript}
> {  
>    "type":"record",
>    "name":"MapDerived2",
>    "fields":[  
>       {  
>          "name":"c",
>          "type":[  
>             "null",
>             "string"
>          ],
>          "default":null
>       },
>       {  
>          "name":"a",
>          "type":[  
>             "null",
>             "int"
>          ],
>          "default":null
>       },
>       {  
>          "name":"b",
>          "type":[  
>             "null",
>             "string"
>          ],
>          "default":null
>       }
>    ]
> }
> {code}
> This schema ignores the Map completely.
> Probably, for such a class, the schema should look like:
> {code:javascript}
> {
>    "type":"record",
>    "name":"MapDerived2",
>    "fields":[  
>       {  
>          "name":"c",
>          "type":[  
>             "null",
>             "string"
>          ],
>          "default":null
>       },
>       .... // Other fields in the class extending the Map
>      {
>         "name":"BASE_MAP",
>          "type":[
>             "null",
>             "map" ... // Normal map which the class extends (implements?)
>          ],
>          "default":null
>      }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to