[GitHub] [iceberg] openinx opened a new issue #1578: The AvroSchemaUtil will convert iceberg's optional map into confusing union.

GitBox Sat, 10 Oct 2020 06:23:32 -0700


openinx opened a new issue #1578:
URL: https://github.com/apache/iceberg/issues/1578



   When I write few unit tests for 
https://github.com/apache/iceberg/pull/1477/files, I found that the 
encode/decode test would not pass because of the AvroSchemaUtil conversion 
issue. 
   
   The test is easy to understand: 
   
   ```java
   package org.apache.iceberg.avro;
   
   
   import java.io.IOException;
   import java.util.List;
   import org.apache.avro.Schema;
   import org.apache.avro.generic.GenericData;
   import org.junit.Assert;
   
   public class TestAvroEncoderUtil extends AvroDataTest {
   
     @Override
     protected void writeAndValidate(org.apache.iceberg.Schema schema) throws 
IOException {
       List<GenericData.Record> expected = RandomAvroData.generate(schema, 100, 
1990L);
       Schema avroSchema = AvroSchemaUtil.convert(schema.asStruct());
   
       for (GenericData.Record record : expected) {
         byte[] serializedData = AvroEncoderUtil.encode(record, avroSchema);
         GenericData.Record expectedRecord = 
AvroEncoderUtil.decode(serializedData);
   
         byte[] serializedData2 = AvroEncoderUtil.encode(expectedRecord, 
avroSchema);
         Assert.assertArrayEquals(serializedData2, serializedData);
     }
   ```
   
   After digging into this issue, I found that the cause is 
[here](https://github.com/apache/iceberg/commit/d8cecc411daf16955963766fa6336d4260e7c797#diff-192650b1711edcd50a73986ec880528cR144).
 
   
   For example,  if we convert the simple iceberg schema to avro schema: 
   
   ```java
       Schema schema = new Schema(
           required(0, "id", Types.LongType.get()),
           optional(1, "data", Types.MapType.ofOptional(2, 3,
               Types.LongType.get(),
               Types.StringType.get())));
       org.apache.avro.Schema avroSchema = 
AvroSchemaUtil.convert(schema.asStruct());
       System.out.println(avroSchema.toString(true));
   ```
   
   We will get 
   
   ```json
   {
     "type" : "record",
     "name" : "rnull",
     "fields" : [ {
       "name" : "id",
       "type" : "long",
       "field-id" : 0
     }, {
       "name" : "data",
       "type" : [ "null", {
         "type" : "array",       // <-  it will add an array ???   That's quite 
confusing ? 
         "items" : {
           "type" : "record",
           "name" : "k2_v3",
           "fields" : [ {
             "name" : "key",
             "type" : "long",
             "field-id" : 2
           }, {
             "name" : "value",
             "type" : [ "null", "string" ],
             "default" : null,
             "field-id" : 3
           } ]
         },
         "logicalType" : "map"
       } ],
       "default" : null,
       "field-id" : 1
     } ]
   }
   ```
   
   For my understanding,  it should be the normal  json: 
   
   ```json
   {
     "type" : "record",
     "name" : "rnull",
     "fields" : [ {
       "name" : "id",
       "type" : "long",
       "field-id" : 0
     }, {
       "name" : "data",
       "type" : [ "null", {
         "type" : "record",
           "name" : "k2_v3",
           "fields" : [ {
             "name" : "key",
             "type" : "long",
             "field-id" : 2
           }, {
             "name" : "value",
             "type" : [ "null", "string" ],
             "default" : null,
             "field-id" : 3
           } ],
         "logicalType" : "map"
       } ],
       "default" : null,
       "field-id" : 1
     } ]
   }
   ```
   
   What's the reason that we plan to accomplish like that ?   Not quite 
understand the log message from the commit 
https://github.com/apache/iceberg/commit/d8cecc411daf16955963766fa6336d4260e7c797
 actually.
   
   @rdblue 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx opened a new issue #1578: The AvroSchemaUtil will convert iceberg's optional map into confusing union.

Reply via email to