Hi all,

I’d like to discuss how we could design a processor to merge two data streams. 
We already had several versions of this component in the past, but none of them 
is completely satisfactory.

I would suggest two different processors for two common use cases:
The first one is to append a label (e.g. for machine learning) to the data 
stream. The processor has two inputs, one with the sensor events (potentially a 
high frequency) and one with the label information (usually a much lower event 
Frequency compared to sensor events). The processor enriches the sensor stream 
with the selected properties of the label stream.

The second processor merges two streams by their timestamp. This could be 
implemented with flink, but since it is a common use case I think we also need 
a lightweight solution in Java. What do you think? 
Here are a couple of things we need to keep in mind designing the component:
* How to deal with late arriving events?
* How big must the buffer (state) for the data streams be to synchronize the 
events? (E.g. there is a large delay in one of the streams)
* Can we assume that events of one stream are in order?

Do you have any other ideas about what we need to consider?

Cheers,
Philipp

Reply via email to