nbali commented on pull request #15951:
URL: https://github.com/apache/beam/pull/15951#issuecomment-1030165710


   Well we were both right and wrong. The lack of SDF is the problem, but the 
runner is causing it.
   
   .... and the lack that experiment isn't the cause. For DataflowRunner you 
don't need that:
    * The if condition already returns the same value as it would have if it 
has that experiment enabled:
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1309-L1322
   * The pipeline creation would fail without SDF if stopReadTime is set:
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1301-L1303
   
   The problem is this method:
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L478
   
   DataflowRunner replaces the SDF with a custom implementation. This way the 
SDF isn't part of the graph anymore, so anything done inside that SDF gets 
completely ignored by the DataflowRunner. I have no idea where to escalate 
this, but this is a LOT of lost functionality without any warning whatsoever.
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1872-L1890
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to