Composites are very much supported, the guide is in progress ;-) You can see for example the CountWords <https://github.com/apache/beam/blob/d86db15ba22cbd99093327dc4962e06fa2d5db43/examples/java/src/main/java/org/apache/beam/examples/WordCount.java#L126> composite.
Not sure what you mean by "performance", could you elaborate on that please ? The Spark runner currently supports batch, streaming support is being rolled-out these days so you should keep track. On Wed, Feb 15, 2017 at 10:42 AM ankit beohar <[email protected]> wrote: > Amit > > Thanks for your fast response now I got it, my use case will solve using > composite transforms (which is in progress I guess). > But if I twist my logic and put in a way you mentioned to just use > different I/O and run on top of SPARK then I guess BEAM will handle batch > and streaming performance issue right? > > Best Regards, > ANKIT BEOHAR > > > On Wed, Feb 15, 2017 at 2:03 PM, Amit Sela <[email protected]> wrote: > > > Oh, missed your question on which one is better.... it really depends on > > your use case. > > If the data is homogenous, and you want to write to the same IO, I don't > > see a reason not to Flatten them into one PCollection. > > If you want to write files-to-files and Kafka-to-Kafka you might be > better > > off with two separate pipelines, batch and streaming. And to make things > > even more elegant you could "compact" your (common) series of > > transformations into a single composite transform such that you end-up > with > > something like: > > > > *lines.apply(MyComposite)* > > *moreLines.apply(MyComposite)* > > > > Composite transforms programming guide is still under construction, > should > > be available here once ready : > > https://beam.apache.org/documentation/programming- > > guide/#transforms-composite > > > > > > On Wed, Feb 15, 2017 at 10:28 AM Amit Sela <[email protected]> wrote: > > > > > You can write one pipeline and simply replace the IO, for example: > > > > > > To read from (text) files you can use: > > > *PCollection<String> lines = > > > p.apply(TextIO.Read.from("file://some/inputData.txt")); * > > > > > > and from Kafka (I'm adding a generic key here because Kafka messages > are > > > keyed): > > > *PCollection<KV<K, String>> moreLines = p,apply(* > > > * KafkaIO.<K, String>read()* > > > * .withBootstrapServers("brokers.list")* > > > * .withTopics("topic-list")* > > > * .withKeyCoder(Coder<K>)* > > > * .withValueCoder(StringUtf8Coder.of()));* > > > > > > Now you can apply the same code to both PCollections, or (as you > > > mentioned) you can Flatten the together into one PCollection (after > > > removing the keys from Kafka-read PCollection) and apply the > > > transformations you want. > > > > > > You might find the IO section in the programming guide useful: > > > https://beam.apache.org/documentation/programming-guide/#io > > > > > > > > > On Wed, Feb 15, 2017 at 10:13 AM ankit beohar <[email protected] > > > > > wrote: > > > > > > Hi All, > > > > > > I have a use case where I have kafka and flat files so can I write one > > code > > > and run for both or I have to create two different pipelines or use > > > pipeline join in a one pipeline. > > > > > > Which one is better? > > > > > > Best Regards, > > > ANKIT BEOHAR > > > > > > > > >
