[
https://issues.apache.org/jira/browse/BEAM-3925?focusedWorklogId=95674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95674
]
ASF GitHub Bot logged work on BEAM-3925:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Apr/18 17:45
Start Date: 26/Apr/18 17:45
Worklog Time Spent: 10m
Work Description: pupamanyu commented on issue #5141: [BEAM-3925] Allow
ValueProvider for KafkaIO so that we can create Beam Templates using KafkaIO
URL: https://github.com/apache/beam/pull/5141#issuecomment-384728529
If subscribing to Kafka topic, then Kafka handles partition assignment and
rebalancing in case of failures(inside consumer group) automatically. But
if certain process requires a consumer to subscribe to list of topic
partitions for a specific situation as described in Manual Assignment of
Kafka Documentation, then the option of TopicPartition helps. Templates are
staged beam pipelines so that users can execute the pipeline without coding
but still provide a flexibility of setting some parameters(only the ones
that are exposed) for beam without any modification to the code. I hope
this answers your first question.
I can provide more insight into why TopicPartitions options helps provide
more flexibility to the Beam Templates. Kafka Client can read from a
specific list of partitions or from the topic as a whole. Providing the
List of TopicPartitions as a CSV in the format <topic name>:<partition
number 1>, <topic name>:<partition number 2>, <topic name>:<partition
number 3>...
This provides more flexibility. The Kafka documentation provides more
details on how Manual Assignment of Topics are useful for consumers.
https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
Snippet from the documentation below:
However, in some cases you may need finer control over the specific
partitions that are assigned. For example:
- If the process is maintaining some kind of local state associated with
that partition (like a local on-disk key-value store), then it should only
get records for the partition it is maintaining on disk.
- If the process itself is highly available and will be restarted if it
fails (perhaps using a cluster management framework like YARN, Mesos, or
AWS facilities, or as part of a stream processing framework). In this case
there is no need for Kafka to detect the failure and reassign the
partition
since the consuming process will be restarted on another machine.
On Thu, Apr 26, 2018 at 10:08 AM, Raghu Angadi <[email protected]>
wrote:
> Thanks for the update Pramod. Made a few comments above. I will think a
> bit more about the over all approach to providing these options. If reader
> is initialized with topic, the partition list is read at the graph building
> time. I don't have much experience with templates, but the graph building
> phase is not executed for a template, right? When is the number of source
> partitions decided in a template?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/beam/pull/5141#issuecomment-384717287>, or mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AB2QCLN4QHDQJuY29aQ9Fd-X-6XRXFWSks5tsf8FgaJpZM4TXOv5>
> .
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 95674)
Time Spent: 5h 20m (was: 5h 10m)
> Allow ValueProvider for KafkaIO
> -------------------------------
>
> Key: BEAM-3925
> URL: https://issues.apache.org/jira/browse/BEAM-3925
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Sameer Abhyankar
> Assignee: Pramod Upamanyu
> Priority: Major
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> Add ValueProvider support for the various methods in KafkaIO. This would
> allow us to use KafkaIO in reusable pipeline templates.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)