[jira] [Work logged] (BEAM-3925) Allow ValueProvider for KafkaIO

ASF GitHub Bot (JIRA) Thu, 26 Apr 2018 10:46:16 -0700

     [ 
https://issues.apache.org/jira/browse/BEAM-3925?focusedWorklogId=95674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95674
 ]


ASF GitHub Bot logged work on BEAM-3925:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Apr/18 17:45
            Start Date: 26/Apr/18 17:45
    Worklog Time Spent: 10m 
      Work Description: pupamanyu commented on issue #5141: [BEAM-3925] Allow 
ValueProvider for KafkaIO so that we can create Beam Templates using KafkaIO
URL: https://github.com/apache/beam/pull/5141#issuecomment-384728529
 
 
   If subscribing to Kafka topic, then Kafka handles partition assignment and
   rebalancing in case of failures(inside consumer group) automatically. But
   if certain process requires a consumer to subscribe to list of topic
   partitions for a specific situation as described in Manual Assignment of
   Kafka Documentation, then the option of TopicPartition helps. Templates are
   staged beam pipelines so that users can execute the pipeline without coding
   but still provide a flexibility of setting some parameters(only the ones
   that are exposed) for beam without any modification to the code. I hope
   this answers your first question.
   
   
    I can provide more insight into why TopicPartitions options helps provide
   more flexibility to the Beam Templates. Kafka Client can read from a
   specific list of partitions or from the topic as a whole. Providing the
   List of TopicPartitions as a CSV in the format <topic name>:<partition
   number 1>, <topic name>:<partition number 2>, <topic name>:<partition
   number 3>...
   
   This provides more flexibility. The Kafka documentation provides more
   details on how Manual Assignment of Topics are useful for consumers.
   
   
https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
   
   Snippet from the documentation below:
   
   However, in some cases you may need finer control over the specific
   partitions that are assigned. For example:
   
   
      - If the process is maintaining some kind of local state associated with
      that partition (like a local on-disk key-value store), then it should only
      get records for the partition it is maintaining on disk.
      - If the process itself is highly available and will be restarted if it
      fails (perhaps using a cluster management framework like YARN, Mesos, or
      AWS facilities, or as part of a stream processing framework). In this case
      there is no need for Kafka to detect the failure and reassign the 
partition
      since the consuming process will be restarted on another machine.
   
   
   
   
   
   
   
   On Thu, Apr 26, 2018 at 10:08 AM, Raghu Angadi <[email protected]>
   wrote:
   
   > Thanks for the update Pramod. Made a few comments above. I will think a
   > bit more about the over all approach to providing these options. If reader
   > is initialized with topic, the partition list is read at the graph building
   > time. I don't have much experience with templates, but the graph building
   > phase is not executed for a template, right? When is the number of source
   > partitions decided in a template?
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/5141#issuecomment-384717287>, or mute
   > the thread
   > 
<https://github.com/notifications/unsubscribe-auth/AB2QCLN4QHDQJuY29aQ9Fd-X-6XRXFWSks5tsf8FgaJpZM4TXOv5>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 95674)
    Time Spent: 5h 20m  (was: 5h 10m)

> Allow ValueProvider for KafkaIO
> -------------------------------
>
>                 Key: BEAM-3925
>                 URL: https://issues.apache.org/jira/browse/BEAM-3925
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Sameer Abhyankar
>            Assignee: Pramod Upamanyu
>            Priority: Major
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Add ValueProvider support for the various methods in KafkaIO. This would 
> allow us to use KafkaIO in reusable pipeline templates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3925) Allow ValueProvider for KafkaIO

Reply via email to