[ 
https://issues.apache.org/jira/browse/BEAM-13171?focusedWorklogId=721016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721016
 ]

ASF GitHub Bot logged work on BEAM-13171:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Feb/22 16:52
            Start Date: 04/Feb/22 16:52
    Worklog Time Spent: 10m 
      Work Description: nbali commented on pull request #15951:
URL: https://github.com/apache/beam/pull/15951#issuecomment-1030165710


   Well we were both right and wrong. The lack of SDF is the problem, but the 
runner is causing it.
   
   .... and the lack that experiment isn't the cause. For DataflowRunner you 
don't need that:
    * The if condition already returns the same value as it would have if it 
has that experiment enabled:
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1309-L1322
   * The pipeline creation would fail without SDF if stopReadTime is set:
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1301-L1303
   
   The problem is this method:
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L478
   
   DataflowRunner replaces the SDF with a custom implementation. This way the 
SDF isn't part of the graph anymore, so anything done inside that SDF gets 
completely ignored by the DataflowRunner. I have no idea where to escalate 
this, but this is a LOT of lost functionality without any warning whatsoever.
   
https://github.com/apache/beam/blob/163ac6a3c10c26898ad89ca8bedde8ef78ee7ee2/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1872-L1890
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 721016)
    Time Spent: 5h 10m  (was: 5h)

> Support for stopReadTime on KafkaIO SDF 
> ----------------------------------------
>
>                 Key: BEAM-13171
>                 URL: https://issues.apache.org/jira/browse/BEAM-13171
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kafka
>            Reporter: Mostafa Aghajani
>            Assignee: Mostafa Aghajani
>            Priority: P2
>             Fix For: 2.36.0
>
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There is already the support for startReadTime using SDF when the Kafka 
> version is supported.
> I want to add the support for stopReadTIme so we can extract messages from 
> Kafka only up to a point in time and then the task will be finished.
> One use case: when you want to only re-process (re-read) a period of time for 
> a Kafka topic in your pipeline.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to