lazarillo commented on issue #19553: URL: https://github.com/apache/beam/issues/19553#issuecomment-2329902596
It looks like this isn't moving anywhere. What is the preferred / ideal solution for combining two streams? Here are my specifics, in case there are some good examples: - I am combining two Pub/Sub streams - Each of the streams is large enough that I cannot treat either of them as a side input - One of the streams will be joined multiple times to the other stream - The time window is difficult to determine ==> it could be seconds or minutes, or it could take days The specific case I have is in payment processing: - I have an initial payment transaction with all of the initial details - I have payment transaction "events" like the payment is created, authorized, captured, refunded, etc - Each event has the ID of the initial payment, so that it can be joined simply on the payment ID - But there might be 6 events, or maybe 20 events - And the events may occur over the span of a few seconds and be done, or there may be events that are associated with payments from weeks or months ago I was initially thinking of creating two streams with a reasonable window of maybe a minute or two; whatever we can find an acceptable delay for reaching our storage. Then in that window, all the payments will be saved and available for joining the payment events to them. Any payment events that are associated with a payment that has fallen outside of the window wherein it is retained can be pulled from storage to append the new events. I'd like to have a nice example for that first half of what I proposed: where both the payment and payment event are available in the time window. There are several examples of how to do this with side inputs, but I cannot seem to find streams of two large datasets (eg, two Pub/Sub or Kafka streams). I would ideally like an example in Python, but any language is fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
