GitHub user xintongsong added a comment to the discussion: Replay-based Per 
Action State Consistency

@letaoj Thanks for updating the design doc based on our offline discussion. I 
think the overall design is quite good now. I just have a few more comments on 
the details.

1. For the request-response map, I think we should use some unique identifier 
of the action execution as a key, rather than hash of event. Because one event 
may trigger multiple actions. It looks right from `TaskActionState` which tries 
to capture the execution state of an action. But in the execution flow, it 
shows hash of events are used as part of the map key.

2. I'd suggest not to rebuild the short-term memory at the beginning, but to 
rebuild it during replaying the actions. To be specific, when recovering from a 
checkpoint, the short-term memory (state) should be restored to how it was when 
the checkpoint was made. Then we replay the inputs, and check for whether the 
action has already been performed. If performed, we skip the action, applies 
any state changes it made, and get the output (events). This ensures actions 
being re-executed see the same state as it was executed for the first time.

3. `<message_key>-<event_hash_1>: {"request": request, "short-term-memory": 
short_term_memory.dump_json()"}` Does this mean we are storing the whole 
short-term memory for each request-response pair? That should be unnecessary. 
Since the full short-term memory is already persisted with the checkpoint, we 
only need to persist the incremental changes of short-term memory since the 
checkpint.

4. `TaskActionState .output_event` should be a list, because each action may 
emit multiple events.

GitHub link: 
https://github.com/apache/flink-agents/discussions/108#discussioncomment-14209491

----
This is an automatically sent email for issues@flink.apache.org.
To unsubscribe, please send an email to: issues-unsubscr...@flink.apache.org

Reply via email to