stevenzwu commented on issue #1643:
URL: https://github.com/apache/iceberg/issues/1643#issuecomment-714811863


   Regarding copying RowData from iterator to a fixed-size array by the 
`IcebergSourceSplitReader<T>`, let's first discuss some of the implementation 
details.
   
   - How would `IcebergSourceSplitReader<T>` know how to create a reuseable 
empty object array for a generic type T?
   - How would `IcebergSourceSplitReader<T>` know how to clone the generic type 
T returned by DataIterator.next()? Both Flink RowData and Avro GenericRecord 
don't implement `Cloneable` interface.
   
   RowDataIterator and its underneath Flink*Reader have the knowledge on how to 
achieve the correct object reuse semantic. 
   
   Assuming `IcebergSourceSplitReader<T>` can implement proper cloning to 
reusable array, we should still discuss the reuse semantic for records emitted 
by Iceberg source. Not sure if Flink defines the reuse semantic for the records 
emitted by source or it is up to the source implementation. If it is up to the 
source and we decided that Iceberg source should only emit reused records, we 
should clearly document the expectation. 
   
   E.g. users could have a chained downstream operator (no serialization) that 
performs de-dup aggregation in memory. When it first time saw a key, it keep 
the record/row for the key. If there is another row coming for the same key, it 
simply update a field in the first received row.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to