stevenzwu opened a new issue #1643:
URL: https://github.com/apache/iceberg/issues/1643


   While debugging unit test failure for the FLIP-27 source PoC code, I 
realized that `RowDataIterator` always reuse and return the same 
`GenericRowData` object. It seems to expect caller to clone or convert the 
object. `TestFlinkInputFormat` was using `TestHelpers.copyRowData` to do it. I 
guess Flink SQL convert RowData to Row.
   
   This behavior makes it difficult for the SplitReader to use the iterator, 
because SplitReader fetches a batch of records.
   
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceSplitReader.java#L80-L87
   
   Should we make the `RowDataIterator`  support `reuse` flag? 
   
   Also what should be the default reuse value (true or false)? When I was 
looking at this code in `FlinkInputFormat`, it seems that base `InputFormat` 
class expects the iterator to only return reused object, if `reuse` input arg 
is not null. So the `iterator.next()` seems to break that contract.
   ```
     @Override
     public RowData nextRecord(RowData reuse) {
       return iterator.next();
     }
   ```
   
   @JingsongLi @openinx please share your thoughts.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to