[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391167#comment-15391167
 ] 

ASF GitHub Bot commented on APEXMALHAR-2063:
--------------------------------------------

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/apex-malhar/pull/322#discussion_r71995041
  
    --- Diff: 
library/src/main/java/org/apache/apex/malhar/lib/wal/WindowDataManager.java ---
    @@ -41,15 +41,42 @@
      *
      * @since 2.0.0
      */
    -public interface WindowDataManager extends StorageAgent, 
Component<Context.OperatorContext>
    +public interface WindowDataManager extends 
Component<Context.OperatorContext>
     {
       /**
    +   * Save the state for a window id.
    +   * @param object    state
    +   * @param windowId  window id
    +   * @throws IOException
    +   */
    +  void save(Object object, long windowId) throws IOException;
    +
    +  /**
    +   * Gets the object saved for the provided window id. <br/>
    +   * Typically it is used to replay tuples of successive windows in input 
operators after failure.
    +   *
    +   * @param windowId window id
    +   * @return saved state for the window id.
    +   * @throws IOException
    +   */
    +  Object retrieve(long windowId) throws IOException;
    +
    +  /**
    +   * Delete the artifact corresponding to the
    --- End diff --
    
    complete javadoc here?


> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
>                 Key: APEXMALHAR-2063
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying 
> tuples every completed application window after failure. For this it saves 
> meta-data in a file per window. Having multiple small size files on hdfs 
> cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain 
> a mapping of how much data was flushed to WAL each window. 
> In order to use FileSystemWAL for replaying data of a finished window, there 
> are few changes made to FileSystemWAL this is because of following:
> 1. WindowDataManager needs to reply data of every finished window. This 
> window may not be checkpointed. 
> FileSystemWAL truncates the WAL file to the checkpointed point after recovery 
> so this poses a problem. 
> WindowDataManager should be able to control recovery of FileSystemWAL.
> 2.  FileSystemWAL writes to temporary files. The mapping of temp files to 
> actual file is part of its state which is checkpointed. Since 
> WindowDataManager replays data of a window not yet checkpointed, it needs to 
> know the actual temporary file the data is being persisted to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to