Hi Mohil,

I've responded to another thread you started. Please, check on:
https://lists.apache.org/thread.html/r93066c4a3daefba954b23dbdb4536159ff3ff29b06052b92e747da17%40%3Cdev.beam.apache.org%3E

Regards,
Mikhail.

On Tue, Jan 28, 2020 at 10:22 AM Mohil Khare <[email protected]> wrote:

> Hi,
> This is Mohil Khare from San Jose, California. I work in an early stage
> startup: Prosimo.
> We use Apache beam with gcp dataflow for all real time stats processing
> with Kafka and Pubsub as data source while elasticsearch and GCS as sinks.
>
> I am trying to solve the following use with sideinputs.
>
> INPUT:
> 1. We have a continuous stream of data coming from pubsub topicA. This
> data can be put in KV Pcollection and each data item can be uniquely
> identified with certain key.
> 2. We have a very slow changing stream of data coming from pubsub topicB
> i.e. you can say that stream of data comes for few mins on topicB followed
> by no activity for a long time period.   This stream of data can be again
> put in KV PCollection with same keys as above. NOTE: after long inactivity,
> it is possible that data comes for only certain keys.
>
> DESIRED OUTPUT/PROCESSING:
> 1. I want to use KV PCollection as sideinput to enrich data arriving in
> topicA. I think View.asMap can be a good choice for it.
> 2. After enriching data in topic A using sideinput data from topic B,
> write to GCS in a fixed window of 10 minutes
> 2.  Want to continue using above PCollectionView as sideinput as long as
> no new data arrives in topicB.
> 3. Whenever new data arrives in topicB, want to update PCollectionView Map
> only for set of Keys that arrived in new stream.
>
> My question is what should be the best approach to tackle this use case? I
> will really appreciate if someone can suggest some good solution.
>
> Thanks and Regards
> Mohil Khare
>

Reply via email to