damccorm opened a new issue, #20310:
URL: https://github.com/apache/beam/issues/20310

   I have a use case which I think might be a good addition to the pipelines 
patterns:
   
    
   beam (java sdk) reads two kind of records from data stream like Kafka:
    
   1. Records of type A containing key and corresponding metadata. 
   2. Records of type B containing the same key, but no metadata. Beam then 
needs to fill metadata for records of type B  by doing a lookup for metadata 
using keys received in records of type A. 
    
   Idea is to save metadata or rather state for keys received in records of 
type A and then do a lookup when records of type B are received.
    Beam's "@State" construct  can be used here, however, problem is that we 
don't know when keys should expire. I don't think keeping a global window will 
be a good idea as there could be many keys (may be millions over a period of 
time) to be saved in a state.
    
   One possible solution as suggested by Reza Ardeshir Rokni 
([email protected]):
    
   We can maintain a state in a large fixed window (1 day or so), so that GC 
can happen within a window bound. After window expire, save the metadata values 
in an external DB like BigQuery. If we get a record with same key in a new 
window looking for this metadata, fetch the metadata for that key from external 
DB and save it in window's state again.
    
    
    
    
   
    
   
   Imported from Jira 
[BEAM-10019](https://issues.apache.org/jira/browse/BEAM-10019). Original Jira 
may contain additional context.
   Reported by: mohilkhare.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to