[ 
https://issues.apache.org/jira/browse/BEAM-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655604#comment-16655604
 ] 

Raghu Angadi edited comment on BEAM-5786 at 10/18/18 5:14 PM:
--------------------------------------------------------------

Sounds good.

More I think about completely dynamic case, I think it needs to be an SDF to 
handle the dynamic partitions correctly.

One problem I doin't know how to solve correctly in general case using current 
unbounded source API which would use fixed partitions is about watermark for 
empty splits: 

Say {{topics_X}} has a one partition and it is assigned to {{split_A}} at 
runtime. Now {{topic_X}} gets deleted and lets say there are no more partitions 
assigned to split_A. What should be the watermark reported by split_A? If we 
knew that no more partitions assigned to split_A ever in future, we could 
report MAX_TIME.

If there is no work around for this, we can support adding partitions and 
topics dynamically at runtime but should require that they not be deleted. With 
that, it is a fairly straight forward implementation. 


was (Author: rangadi):
Sounds good.

More I think about completely dynamic case, I think it needs to be an SDF to 
handle the dynamic partitions correctly.

One problem I doin't know how to solve correctly in general case using current 
unbounded source API which would use fixed partitions is about watermark for 
empty splits: 

Say topics_x a single partition as it was assigned to split_a at runtime. Now 
lets say topic_x is deleted later and there no more partitions assigned to 
split_a. What should the watermark reported by split_a? If we knew that no more 
partitions assigned to split_a ever in future, we could report MAX_TIME for 
watermark.

If there is no work around for this, we can support adding partitions and 
topics dynamically at runtime but should require that they be not deleted. With 
that it is a fairly straight forward implementation. 

> KafkaIO should support topic patterns
> -------------------------------------
>
>                 Key: BEAM-5786
>                 URL: https://issues.apache.org/jira/browse/BEAM-5786
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kafka
>            Reporter: Alexey Romanenko
>            Assignee: Alexey Romanenko
>            Priority: Major
>
> For the moment, it is possible to set topics for KafkaIO.Read in 2 ways - one 
> single topic by using {{withTopic(String)}} or list of topics by using 
> {{withTopics(List<String>)}}. 
> It would make sense to provide another way for this using patterns, based on 
> regular expressions, like {{withTopics(“org_[0-9]+_source”)}}. In this case, 
> Kafka consumer could discover the topics based on user provided pattern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to