slinkydeveloper edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-948377409
@tsreaper I start getting your point of lazyness, but now I wonder, why is Avro different from, for example parquet? In parquet format, I see that when you invoke `readBatch` you load in memory the actual data and then return an iterator which iterates on those. So the I/O is performed within the `readBatch` invocation. Same for all the other formats I've looked at. Your lazy reading is changing the threading model of the source, because with this PR the "consumer" thread (`SourceReaderBase::pollNext` in particular) will be the thread which will perform the actual I/O processing. Why is Avro special in this sense? Why don't we do the same for Parquet? And are we sure this is the way the source architecture is intended to work? Couldn't this cause issues because the original design might have not considered performing I/O within the consumer thread? Even with lazyness, I still think we don't need concurrency primitives to serialize `nextBatch` invocations because each `SourceOperator` has its own instance of `SourceReader` (look at the method `SourceOperator::initReader`) and from `SourceOperator` the method `SourceReaderBase::pollNext` is invoked, which triggers the actual reading of avro records from your lazy operator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org