[ https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391167#comment-15391167 ]
ASF GitHub Bot commented on APEXMALHAR-2063: -------------------------------------------- Github user ilooner commented on a diff in the pull request: https://github.com/apache/apex-malhar/pull/322#discussion_r71995041 --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/WindowDataManager.java --- @@ -41,15 +41,42 @@ * * @since 2.0.0 */ -public interface WindowDataManager extends StorageAgent, Component<Context.OperatorContext> +public interface WindowDataManager extends Component<Context.OperatorContext> { /** + * Save the state for a window id. + * @param object state + * @param windowId window id + * @throws IOException + */ + void save(Object object, long windowId) throws IOException; + + /** + * Gets the object saved for the provided window id. <br/> + * Typically it is used to replay tuples of successive windows in input operators after failure. + * + * @param windowId window id + * @return saved state for the window id. + * @throws IOException + */ + Object retrieve(long windowId) throws IOException; + + /** + * Delete the artifact corresponding to the --- End diff -- complete javadoc here? > Integrate WAL to FS WindowDataManager > ------------------------------------- > > Key: APEXMALHAR-2063 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063 > Project: Apache Apex Malhar > Issue Type: Improvement > Reporter: Chandni Singh > Assignee: Chandni Singh > > FS Window Data Manager is used to save meta-data that helps in replaying > tuples every completed application window after failure. For this it saves > meta-data in a file per window. Having multiple small size files on hdfs > cause issues as highlighted here: > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > Instead FS Window Data Manager can utilize the WAL to write data and maintain > a mapping of how much data was flushed to WAL each window. > In order to use FileSystemWAL for replaying data of a finished window, there > are few changes made to FileSystemWAL this is because of following: > 1. WindowDataManager needs to reply data of every finished window. This > window may not be checkpointed. > FileSystemWAL truncates the WAL file to the checkpointed point after recovery > so this poses a problem. > WindowDataManager should be able to control recovery of FileSystemWAL. > 2. FileSystemWAL writes to temporary files. The mapping of temp files to > actual file is part of its state which is checkpointed. Since > WindowDataManager replays data of a window not yet checkpointed, it needs to > know the actual temporary file the data is being persisted to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)