Currently in Samza, to enable reuse of local store between restarts, local store is persisted outside of the YARN’s working directory. However, there is no mechanism currently available to periodically clean up the unused local stores. Here is a proposal detailing a possible way to accomplish this:
https://issues.apache.org/jira/secure/attachment/12826531/GCstalelocalstate.pdf This is tracked in SAMZA-656. Any feedback/comments are welcome. Thanks.