[
https://issues.apache.org/jira/browse/BEAM-3925?focusedWorklogId=95749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95749
]
ASF GitHub Bot logged work on BEAM-3925:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Apr/18 22:08
Start Date: 26/Apr/18 22:08
Worklog Time Spent: 10m
Work Description: rangadi commented on issue #5141: [BEAM-3925] Allow
ValueProvider for KafkaIO so that we can create Beam Templates using KafkaIO
URL: https://github.com/apache/beam/pull/5141#issuecomment-384804212
Let's step back a bit and see how a KafkaIO source would work in a template:
AFAIK, the driver part of launching a Dataflow templated pipeline runs only
once while installing the template. Each run of a template executes a
serialized job that was stored and does not run the driver that builds the
graph of computations.
This does not work well for Beam unbounded sources. E.g. `split()`[1] on an
unbounded source like KafkaIO is invoked on the driver, this also decides
parallelism. In KafkaIO, how many splits should we return? One option is to
just return a fixed number like 100 during template preparation, and distribute
the actual partitions among these 100 at template runtime.
In order to iron out such issues early, it might be a good idea to try a
template before finalizing this PR. WDYT?
[1]:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java#L68
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 95749)
Time Spent: 5.5h (was: 5h 20m)
> Allow ValueProvider for KafkaIO
> -------------------------------
>
> Key: BEAM-3925
> URL: https://issues.apache.org/jira/browse/BEAM-3925
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Sameer Abhyankar
> Assignee: Pramod Upamanyu
> Priority: Major
> Time Spent: 5.5h
> Remaining Estimate: 0h
>
> Add ValueProvider support for the various methods in KafkaIO. This would
> allow us to use KafkaIO in reusable pipeline templates.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)