[GitHub] [iceberg] openinx opened a new issue #1152: Why do we need two avro record readers & writers ?

GitBox Wed, 01 Jul 2020 02:27:01 -0700


openinx opened a new issue #1152:
URL: https://github.com/apache/iceberg/issues/1152



   When I implement the flink avro reader & writer,  I implemented the 
FlinkAvroReader and FlinkAvroWriter by extending the GenericAvroWriter & 
GenericAvroReader class. But when I run the unit test, it says  the data 
generated by `org.apache.iceberg.flink.data.RandomData` could not be written 
into the avro file. because  the following: 
   
   ```
   java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
java.lang.Integer
   org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.ClassCastException: java.time.LocalDate cannot be cast to 
java.lang.Integer
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:317)
        at 
org.apache.iceberg.avro.AvroFileAppender.add(AvroFileAppender.java:52)
        at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:32)
        at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:37)
        at 
org.apache.iceberg.flink.data.TestFlinkAvroReaderWriter.testCorrectness(TestFlinkAvroReaderWriter.java:52)
        at 
org.apache.iceberg.flink.data.TestFlinkAvroReaderWriter.testNormalData(TestFlinkAvroReaderWriter.java:71)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ClassCastException: java.time.LocalDate cannot be cast 
to java.lang.Integer
        at 
org.apache.iceberg.avro.ValueWriters$IntegerWriter.write(ValueWriters.java:177)
        at 
org.apache.iceberg.avro.ValueWriters$StructWriter.write(ValueWriters.java:497)
        at 
org.apache.iceberg.avro.ValueWriters$OptionWriter.write(ValueWriters.java:398)
        at 
org.apache.iceberg.avro.ValueWriters$StructWriter.write(ValueWriters.java:497)
        at 
org.apache.iceberg.avro.ValueWriters$StructWriter.write(ValueWriters.java:497)
        at 
org.apache.iceberg.avro.GenericAvroWriter.write(GenericAvroWriter.java:50)
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314)
        ... 54 more
   ```
   
   I take a closer to the code, and found that the `GenericAvroWriter` will use 
the IntegerWriter to encode the `date` data type,  while the 
`org.apache.iceberg.flink.data.RandomData` will generate a random data by (it's 
a LocalData instance): 
   
   ```java
       @Override
       public Object primitive(Type.PrimitiveType primitive) {
         Object result = randomValue(primitive, random);
         switch (primitive.typeId()) {
           case BINARY:
             return ByteBuffer.wrap((byte[]) result);
           case UUID:
             return UUID.nameUUIDFromBytes((byte[]) result);
           case DATE:
             return EPOCH_DAY.plusDays((Integer) result);
   ```
   
   I was wondering that the same data set generate by `RandomDataGenerator` 
could not be written into avro files while it could be written to parquet 
files, there should be something wrong. 
   
   Finally, I found that we have two avro reader writer. one is 
`org.apache.iceberg.data.avro.DataWriter` and another one is 
`org.apache.iceberg.avro.GenericAvroWriter` , most of the code are the same 
except several data type mapping, such as." 
   1. The `GenericAvroWriter`  will encode `date` by `IntegerWriter`, while the 
`DataWriter`  will encode the `date` by `DateWriter`; 
   2. the `GenericAvroWriter` will encode `time-micros` by `LongWriter` while 
the `DataWriter` will encode it by `TimeWriter` ;  
   3. the `imestamp-micros` will endoe `imestamp-micros` by `LongWriter` while 
the `DateWriter` will encode it by `TimestamptzWriter` or `TimestampWriter`. 
   
   So why do we need the two similar reader & writer ? That's quite confusing 
when I'm trying to implement the flink avro reader, writer (seems I need to 
make the `FlinkAvroWriter` extend the `DataWriter` ).  Is it possible to remove 
one of them and just keep the other ? 
   
   FYI @rdblue .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx opened a new issue #1152: Why do we need two avro record readers & writers ?

Reply via email to