rdblue edited a comment on issue #1152:
URL: https://github.com/apache/iceberg/issues/1152#issuecomment-653185375


   Readers and writers are specific to an in-memory representation, which I'll 
inter-changeably refer to as an object model. The Avro readers and writers 
you're using here are for different object models: Iceberg generics and Avro.
   
   The `data.avro.DataReader` and `data.avro.DataWriter` classes were written 
for the Iceberg generics data model. That uses Iceberg's generic record class, 
Java 8 date/time types, BigDecimal, ByteBuffer, and byte[]. This representation 
is intended for application authors working directly with Iceberg API. That's 
why it uses standard Java representations for most types.
   
   The `avro.GenericAvroReader` and `avro.GenericAvroWriter` classes are for 
working with Avro's generic or specific records. That's why this produces 
`GenericData.Record` or instances of specific classes that all implement Avro's 
`IndexedRecord`. This is intended for internal use -- internal implementations 
of `DataFile` and `ManifestFile` use `IndexedRecord` -- so it produces the 
internal representations, like `BigDecimal`, long microseconds from epoch, or 
int days from epoch.
   
   Unfortunately, early on we left the generic Avro reader/writer 
implementation public in core and have some downstream uses, like the original 
Netflix Flink sink. I think we should eventually remove Avro from the public 
API. I would also like to remove it and make `DataFile` and `ManifestFile` 
implement our own `Record` API, but we would need to have a reader that 
produces the internal value representations.
   
   For Flink, we should build reader/writer classes that produce and consume 
its in-memory representation. That's what we do for Spark and Pig. Based on the 
Parquet support in #1125, I thought that Flink uses Java 8 date/time classes, 
BigDecimal, and ByteBuffer, so it would make sense to base the readers on 
Iceberg generics. In that case, basing your implementation on 
`data.avro.DataReader` and `data.avro.DataWriter` should work fine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to