nbali commented on pull request #15951:
URL: https://github.com/apache/beam/pull/15951#issuecomment-1030165710
Well we were both right and wrong. The lack of SDF is the problem, but the
runner is causing it.
.... and the lack that experiment isn't the cause. For DataflowRunner you
don't need that:
* The if condition already returns the same value as it would have if it
has that experiment enabled:
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1309-L1322
* The pipeline creation would fail without SDF if stopReadTime is set:
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1301-L1303
The problem is this method:
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L478
DataflowRunner replaces the SDF with a custom implementation. This way the
SDF isn't part of the graph anymore, so anything done inside that SDF gets
completely ignored by the DataflowRunner. I have no idea where to escalate
this, but this is a LOT of lost functionality without any warning whatsoever.
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1872-L1890
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]