nbali edited a comment on pull request #15951: URL: https://github.com/apache/beam/pull/15951#issuecomment-1043793735
@lukecwik > That makes a lot of sense. An alternative would be to add support for stop read time to KafkaUnboundedReader. > > This translation seems like it will always be brittle in that some feature/option won't be supported in KafkaUnboundedReader and people will forget to update it here and then a future person will go down this rabbit hole again. PR is coming soon to detect any misuse/lost functionality of `KafkaIO.Read` with fail-fast behaviour at pipeline creation that also enforces developers to add their newly introduced properties to that detection :) @kennknowles > @nbali it can be a bit confusing, but the `isStreaming` pipeline option is not actually part of the core Beam model. It is a runner-specific option. Spark and Dataflow have separate batch/streaming modes. The direct runner and Flink runner don't need this. Really "streaming" is the universal mode that works for everything, while batch is a special case that allows optimizations because all the data is bounded (so we can do more splitting up front, and don't need to checkpoint and pause, etc). I'm completely aware that my knowledge apart from the direct and the dataflow runners are minimal as I have only used those, and I doubt it's going to change. The logic in Spark seemed similar, but it might be actually required. My issue was that with Dataflow it isn't, but let's continue this discussion at my PR aimed at addressing that issue at https://github.com/apache/beam/pull/16773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
