lazarillo commented on issue #19553:
URL: https://github.com/apache/beam/issues/19553#issuecomment-2329902596

   It looks like this isn't moving anywhere.
   
   What is the preferred / ideal solution for combining two streams?
   
   Here are my specifics, in case there are some good examples:
   
   - I am combining two Pub/Sub streams
   - Each of the streams is large enough that I cannot treat either of them as 
a side input
   - One of the streams will be joined multiple times to the other stream
   - The time window is difficult to determine ==> it could be seconds or 
minutes, or it could take days
   
   The specific case I have is in payment processing:
   
   - I have an initial payment transaction with all of the initial details
   - I have payment transaction "events" like the payment is created, 
authorized, captured, refunded, etc
   - Each event has the ID of the initial payment, so that it can be joined 
simply on the payment ID
   - But there might be 6 events, or maybe 20 events
   - And the events may occur over the span of a few seconds and be done, or 
there may be events that are associated with payments from weeks or months ago
   
   I was initially thinking of creating two streams with a reasonable window of 
maybe a minute or two; whatever we can find an acceptable delay for reaching 
our storage.  Then in that window, all the payments will be saved and available 
for joining the payment events to them.  Any payment events that are associated 
with a payment that has fallen outside of the window wherein it is retained can 
be pulled from storage to append the new events.
   
   I'd like to have a nice example for that first half of what I proposed:  
where both the payment and payment event are available in the time window.
   
   There are several examples of how to do this with side inputs, but I cannot 
seem to find streams of two large datasets (eg, two Pub/Sub or Kafka streams).
   
   I would ideally like an example in Python, but any language is fine.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to