[ 
https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994601#comment-15994601
 ] 

Borisa Zivkovic commented on BEAM-638:
--------------------------------------

Just to contribute to this, might be useful for further discussion - as 
beginner user of Beam I also found it confusing that TextIO can not write data 
coming from Kafka directly. Looks a bit inconsistent even if there is good 
reason why is it so. 

Then I tried to do 

myStringsComingFromKafka.apply(Window.<String>into(FixedWindows.of(Duration.standardSeconds(1))))
        .apply("WritingToOutput", 
TextIO.Write.to("/myOutputLocation/").withNumShards(5));

but this also does not work since I need groupByKey or combine after window...

In the official documentation there are few examples how to create bounded 
collection from unbounded but they mostly focus on Combine and GroupByKey - it 
is not very clear what to do if you are working with unbounded collections that 
are not KV and you do not want to combine collections.

Basically all I want to do is 

1) read values (not KV) from kafka
2) write those values, as they are arriving, using TextIO

There might be an example how to do this but it was not easy for me to find it



> Add sink transform to write bounded data per window, pane, [and key] even 
> when PCollection is unbounded
> -------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-638
>                 URL: https://issues.apache.org/jira/browse/BEAM-638
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Davor Bonaci
>
> Today, if the pipeline source is unbounded, and the sink expects a bounded 
> collection, there's no way to use a single pipeline. Even a window creates a 
> chunk on the unbounded PCollection, but the "sub" PCollection is still 
> unbounded.
> It would be helpful for users to have a Window function that create a bounded 
> PCollection (on the window) from an unbounded PCollection coming from the 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to