stevenzwu opened a new issue #1643: URL: https://github.com/apache/iceberg/issues/1643
While debugging unit test failure for the FLIP-27 source PoC code, I realized that `RowDataIterator` always reuse and return the same `GenericRowData` object. It seems to expect caller to clone or convert the object. `TestFlinkInputFormat` was using `TestHelpers.copyRowData` to do it. I guess Flink SQL convert RowData to Row. This behavior makes it difficult for the SplitReader to use the iterator, because SplitReader fetches a batch of records. https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceSplitReader.java#L80-L87 Should we make the `RowDataIterator` support `reuse` flag? Also what should be the default reuse value (true or false)? When I was looking at this code in `FlinkInputFormat`, it seems that base `InputFormat` class expects the iterator to only return reused object, if `reuse` input arg is not null. So the `iterator.next()` seems to break that contract. ``` @Override public RowData nextRecord(RowData reuse) { return iterator.next(); } ``` @JingsongLi @openinx please share your thoughts. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
