openinx commented on a change in pull request #2305:
URL: https://github.com/apache/iceberg/pull/2305#discussion_r740147179
##########
File path: flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java
##########
@@ -41,18 +42,47 @@
private final FileScanTaskReader<T> fileScanTaskReader;
private final InputFilesDecryptor inputFilesDecryptor;
- private Iterator<FileScanTask> tasks;
+ private final CombinedScanTask combinedTask;
+ private final Position position;
+
+ private Iterator<FileScanTask> fileTasksIterator;
private CloseableIterator<T> currentIterator;
public DataIterator(FileScanTaskReader<T> fileScanTaskReader,
CombinedScanTask task,
FileIO io, EncryptionManager encryption) {
this.fileScanTaskReader = fileScanTaskReader;
this.inputFilesDecryptor = new InputFilesDecryptor(task, io, encryption);
- this.tasks = task.files().iterator();
+ this.combinedTask = task;
+ // fileOffset starts at -1 because we started
+ // from an empty iterator that is not from the split files.
+ this.position = new Position(-1, 0L);
Review comment:
The general `DataIterator` don't use the position or `seek` method to
skip tasks or records. Putting all the flip-27 related logics in the flink
common read path does not make sense to me, because every times when I read
this class, I need to see which part is related to flip-27, which is the
unrelated part.
I will suggest to introduce a separate SeekableDataIterator to isolate the
two code path, I made a simple commit for this:
https://github.com/openinx/incubator-iceberg/commit/b08dde86aae0c718d9d72acb347dffb3a836b336,
you may want to take a look.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]