Re: Stream and Batch Use Case

Amit Sela Wed, 15 Feb 2017 01:00:40 -0800

Composites are very much supported, the guide is in progress ;-)
You can see for example the CountWords
<https://github.com/apache/beam/blob/d86db15ba22cbd99093327dc4962e06fa2d5db43/examples/java/src/main/java/org/apache/beam/examples/WordCount.java#L126>
 composite.


Not sure what you mean by "performance", could you elaborate on that please
?

The Spark runner currently supports batch, streaming support is being
rolled-out these days so you should keep track.

On Wed, Feb 15, 2017 at 10:42 AM ankit beohar <[email protected]>
wrote:

> Amit
>
> Thanks for your fast response now I got it, my use case will solve using
> composite transforms (which is in progress I guess).
> But if I twist my logic and put in a way you mentioned to just use
> different I/O and run on top of SPARK then I guess BEAM will handle batch
> and streaming performance issue right?
>
> Best Regards,
> ANKIT BEOHAR
>
>
> On Wed, Feb 15, 2017 at 2:03 PM, Amit Sela <[email protected]> wrote:
>
> > Oh, missed your question on which one is better.... it really depends on
> > your use case.
> > If the data is homogenous, and you want to write to the same IO, I don't
> > see a reason not to Flatten them into one PCollection.
> > If you want to write files-to-files and Kafka-to-Kafka you might be
> better
> > off with two separate pipelines, batch and streaming. And to make things
> > even more elegant you could "compact" your (common) series of
> > transformations into a single composite transform such that you end-up
> with
> > something like:
> >
> > *lines.apply(MyComposite)*
> > *moreLines.apply(MyComposite)*
> >
> > Composite transforms programming guide is still under construction,
> should
> > be available here once ready :
> > https://beam.apache.org/documentation/programming-
> > guide/#transforms-composite
> >
> >
> > On Wed, Feb 15, 2017 at 10:28 AM Amit Sela <[email protected]> wrote:
> >
> > > You can write one pipeline and simply replace the IO, for example:
> > >
> > > To read from (text) files you can use:
> > > *PCollection<String> lines =
> > > p.apply(TextIO.Read.from("file://some/inputData.txt"));    *
> > >
> > > and from Kafka (I'm adding a generic key here because Kafka messages
> are
> > > keyed):
> > > *PCollection<KV<K, String>> moreLines = p,apply(*
> > > *    KafkaIO.<K, String>read()*
> > > *        .withBootstrapServers("brokers.list")*
> > > *        .withTopics("topic-list")*
> > > *        .withKeyCoder(Coder<K>)*
> > > *        .withValueCoder(StringUtf8Coder.of()));*
> > >
> > > Now you can apply the same code to both PCollections, or (as you
> > > mentioned) you can Flatten the together into one PCollection (after
> > > removing the keys from Kafka-read PCollection) and apply the
> > > transformations you want.
> > >
> > > You might find the IO section in the programming guide useful:
> > > https://beam.apache.org/documentation/programming-guide/#io
> > >
> > >
> > > On Wed, Feb 15, 2017 at 10:13 AM ankit beohar <[email protected]
> >
> > > wrote:
> > >
> > > Hi All,
> > >
> > > I have a use case where I have kafka and flat files so can I write one
> > code
> > > and run for both or I have to create two different pipelines or use
> > > pipeline join in a one pipeline.
> > >
> > > Which one is better?
> > >
> > > Best Regards,
> > > ANKIT BEOHAR
> > >
> > >
> >
>

Re: Stream and Batch Use Case

Reply via email to