openinx opened a new issue #1152:
URL: https://github.com/apache/iceberg/issues/1152
When I implement the flink avro reader & writer, I implemented the
FlinkAvroReader and FlinkAvroWriter by extending the GenericAvroWriter &
GenericAvroReader class. But when I run the unit test, it says the data
generated by `org.apache.iceberg.flink.data.RandomData` could not be written
into the avro file. because the following:
```
java.lang.ClassCastException: java.time.LocalDate cannot be cast to
java.lang.Integer
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.ClassCastException: java.time.LocalDate cannot be cast to
java.lang.Integer
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:317)
at
org.apache.iceberg.avro.AvroFileAppender.add(AvroFileAppender.java:52)
at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:32)
at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:37)
at
org.apache.iceberg.flink.data.TestFlinkAvroReaderWriter.testCorrectness(TestFlinkAvroReaderWriter.java:52)
at
org.apache.iceberg.flink.data.TestFlinkAvroReaderWriter.testNormalData(TestFlinkAvroReaderWriter.java:71)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.time.LocalDate cannot be cast
to java.lang.Integer
at
org.apache.iceberg.avro.ValueWriters$IntegerWriter.write(ValueWriters.java:177)
at
org.apache.iceberg.avro.ValueWriters$StructWriter.write(ValueWriters.java:497)
at
org.apache.iceberg.avro.ValueWriters$OptionWriter.write(ValueWriters.java:398)
at
org.apache.iceberg.avro.ValueWriters$StructWriter.write(ValueWriters.java:497)
at
org.apache.iceberg.avro.ValueWriters$StructWriter.write(ValueWriters.java:497)
at
org.apache.iceberg.avro.GenericAvroWriter.write(GenericAvroWriter.java:50)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314)
... 54 more
```
I take a closer to the code, and found that the `GenericAvroWriter` will use
the IntegerWriter to encode the `date` data type, while the
`org.apache.iceberg.flink.data.RandomData` will generate a random data by (it's
a LocalData instance):
```java
@Override
public Object primitive(Type.PrimitiveType primitive) {
Object result = randomValue(primitive, random);
switch (primitive.typeId()) {
case BINARY:
return ByteBuffer.wrap((byte[]) result);
case UUID:
return UUID.nameUUIDFromBytes((byte[]) result);
case DATE:
return EPOCH_DAY.plusDays((Integer) result);
```
I was wondering that the same data set generate by `RandomDataGenerator`
could not be written into avro files while it could be written to parquet
files, there should be something wrong.
Finally, I found that we have two avro reader writer. one is
`org.apache.iceberg.data.avro.DataWriter` and another one is
`org.apache.iceberg.avro.GenericAvroWriter` , most of the code are the same
except several data type mapping, such as."
1. The `GenericAvroWriter` will encode `date` by `IntegerWriter`, while the
`DataWriter` will encode the `date` by `DateWriter`;
2. the `GenericAvroWriter` will encode `time-micros` by `LongWriter` while
the `DataWriter` will encode it by `TimeWriter` ;
3. the `imestamp-micros` will endoe `imestamp-micros` by `LongWriter` while
the `DateWriter` will encode it by `TimestamptzWriter` or `TimestampWriter`.
So why do we need the two similar reader & writer ? That's quite confusing
when I'm trying to implement the flink avro reader, writer (seems I need to
make the `FlinkAvroWriter` extend the `DataWriter` ). Is it possible to remove
one of them and just keep the other ?
FYI @rdblue .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]