Amit Thanks for your fast response now I got it, my use case will solve using composite transforms (which is in progress I guess). But if I twist my logic and put in a way you mentioned to just use different I/O and run on top of SPARK then I guess BEAM will handle batch and streaming performance issue right?
Best Regards, ANKIT BEOHAR On Wed, Feb 15, 2017 at 2:03 PM, Amit Sela <[email protected]> wrote: > Oh, missed your question on which one is better.... it really depends on > your use case. > If the data is homogenous, and you want to write to the same IO, I don't > see a reason not to Flatten them into one PCollection. > If you want to write files-to-files and Kafka-to-Kafka you might be better > off with two separate pipelines, batch and streaming. And to make things > even more elegant you could "compact" your (common) series of > transformations into a single composite transform such that you end-up with > something like: > > *lines.apply(MyComposite)* > *moreLines.apply(MyComposite)* > > Composite transforms programming guide is still under construction, should > be available here once ready : > https://beam.apache.org/documentation/programming- > guide/#transforms-composite > > > On Wed, Feb 15, 2017 at 10:28 AM Amit Sela <[email protected]> wrote: > > > You can write one pipeline and simply replace the IO, for example: > > > > To read from (text) files you can use: > > *PCollection<String> lines = > > p.apply(TextIO.Read.from("file://some/inputData.txt")); * > > > > and from Kafka (I'm adding a generic key here because Kafka messages are > > keyed): > > *PCollection<KV<K, String>> moreLines = p,apply(* > > * KafkaIO.<K, String>read()* > > * .withBootstrapServers("brokers.list")* > > * .withTopics("topic-list")* > > * .withKeyCoder(Coder<K>)* > > * .withValueCoder(StringUtf8Coder.of()));* > > > > Now you can apply the same code to both PCollections, or (as you > > mentioned) you can Flatten the together into one PCollection (after > > removing the keys from Kafka-read PCollection) and apply the > > transformations you want. > > > > You might find the IO section in the programming guide useful: > > https://beam.apache.org/documentation/programming-guide/#io > > > > > > On Wed, Feb 15, 2017 at 10:13 AM ankit beohar <[email protected]> > > wrote: > > > > Hi All, > > > > I have a use case where I have kafka and flat files so can I write one > code > > and run for both or I have to create two different pipelines or use > > pipeline join in a one pipeline. > > > > Which one is better? > > > > Best Regards, > > ANKIT BEOHAR > > > > >
