[ 
https://issues.apache.org/jira/browse/BEAM-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876490#comment-15876490
 ] 

Kenneth Knowles commented on BEAM-1517:
---------------------------------------

In the runners I've added this GC to (direct and Dataflow) I was faced with the 
same choice that you are faced with in Flink:

1. Add the ability to clear a whole {{StateNamespace}} OR
2. Scrape the necessary tags off the {{DoFnSignature}}

I ended up going with 2 both times, because it was a bit easier to get done 
quickly. If your underlying storage already supports 1, then you would use the 
same sort of timer set up but just have code to write for cleanup. Since it 
seems pretty common that namespace clearing is not "for free" in different 
runners, I would not make it required as part of {{StateInternals}}. But I do 
think it seems nice to add to runner/core-java the logic for "clear all state 
in this {{DoFnSignature}}".

> Garbage collect user state in Flink Runner
> ------------------------------------------
>
>                 Key: BEAM-1517
>                 URL: https://issues.apache.org/jira/browse/BEAM-1517
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>
> User facing state/timers in Beam are bound to the key/window of the data. 
> Right now, the Flink Runner does not clean up user state when the watermark 
> passes the GC horizon for the state associated with a given window.
> Neither {{StateInternals}} nor the Flink state API support discarding state 
> for a whole namespace (which is the window in this case) so we might have to 
> manually set a GC timer for each window/key combination, as is done in the 
> {{ReduceFnRunner}}. For this we have to know all states a user can possibly 
> use, which we can get from the {{DoFn}} signature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to