Re: Stream and Batch Use Case

ankit beohar Wed, 15 Feb 2017 00:43:30 -0800

Amit

Thanks for your fast response now I got it, my use case will solve using
composite transforms (which is in progress I guess).
But if I twist my logic and put in a way you mentioned to just use
different I/O and run on top of SPARK then I guess BEAM will handle batch
and streaming performance issue right?


Best Regards,
ANKIT BEOHAR


On Wed, Feb 15, 2017 at 2:03 PM, Amit Sela <[email protected]> wrote:

> Oh, missed your question on which one is better.... it really depends on
> your use case.
> If the data is homogenous, and you want to write to the same IO, I don't
> see a reason not to Flatten them into one PCollection.
> If you want to write files-to-files and Kafka-to-Kafka you might be better
> off with two separate pipelines, batch and streaming. And to make things
> even more elegant you could "compact" your (common) series of
> transformations into a single composite transform such that you end-up with
> something like:
>
> *lines.apply(MyComposite)*
> *moreLines.apply(MyComposite)*
>
> Composite transforms programming guide is still under construction, should
> be available here once ready :
> https://beam.apache.org/documentation/programming-
> guide/#transforms-composite
>
>
> On Wed, Feb 15, 2017 at 10:28 AM Amit Sela <[email protected]> wrote:
>
> > You can write one pipeline and simply replace the IO, for example:
> >
> > To read from (text) files you can use:
> > *PCollection<String> lines =
> > p.apply(TextIO.Read.from("file://some/inputData.txt"));    *
> >
> > and from Kafka (I'm adding a generic key here because Kafka messages are
> > keyed):
> > *PCollection<KV<K, String>> moreLines = p,apply(*
> > *    KafkaIO.<K, String>read()*
> > *        .withBootstrapServers("brokers.list")*
> > *        .withTopics("topic-list")*
> > *        .withKeyCoder(Coder<K>)*
> > *        .withValueCoder(StringUtf8Coder.of()));*
> >
> > Now you can apply the same code to both PCollections, or (as you
> > mentioned) you can Flatten the together into one PCollection (after
> > removing the keys from Kafka-read PCollection) and apply the
> > transformations you want.
> >
> > You might find the IO section in the programming guide useful:
> > https://beam.apache.org/documentation/programming-guide/#io
> >
> >
> > On Wed, Feb 15, 2017 at 10:13 AM ankit beohar <[email protected]>
> > wrote:
> >
> > Hi All,
> >
> > I have a use case where I have kafka and flat files so can I write one
> code
> > and run for both or I have to create two different pipelines or use
> > pipeline join in a one pipeline.
> >
> > Which one is better?
> >
> > Best Regards,
> > ANKIT BEOHAR
> >
> >
>

Re: Stream and Batch Use Case

Reply via email to