stevenzwu edited a comment on issue #1643:
URL: https://github.com/apache/iceberg/issues/1643#issuecomment-714700304
Let's look at the nextRecord contract from the `InputFormat`. reuse or not
is controlled by the caller of `nextRecord(OT reuse)`.
```
/**
* @param reuse Object that may be reused.
*/
OT nextRecord(OT reuse) throws IOException;
```
If the input `reuse` arg is null, there is no reuse. Otherwise, input arg
should be used for deserialization. That is how `BinaryInputFormat` implemented.
```
@Override
public T nextRecord(T record) throws IOException {
// ...
record = this.deserialize(record, this.dataInputStream);
// ...
}
```
`CloseableIterable<T>` interface is just NOT a good fit for InputFormat
interface. Now the `FlinkInputFormat` ignore the reuse hint from caller and
simply returns reused RowData object.
```
@Override
public RowData nextRecord(RowData reuse) {
return iterator.next();
}
```
So technically, `FlinkInputFormat` is not conforming to the contract of
`InputFormat` interface. If we truly want to implement the correct nextRecord
contract, we need to change the `CloseableIterable/Iterator` to support `next(T
reuse)`, which is impossible to change the java `Iterator` interface. If it
can't conform to the contract, maybe the default behavior should be the more
conservative (and less performant) behavior of no reuse.
We don't use InputFormat internally. So its behavior is not very relevant to
us. This is purely a semantic discussion. I will address other issues in
separate comments.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]