[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated APEXMALHAR-2063:
--------------------------------------
    Description: 
FS Window Data Manager is used to save meta-data that helps in replaying tuples 
every completed application window after failure. For this it saves meta-data 
in a file per window. Having multiple small size files on hdfs cause issues as 
highlighted here:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

Instead FS Window Data Manager can utilize the WAL to write data and maintain a 
mapping of how much data was flushed to WAL each window. 

In order to use FileSystemWAL for replaying data of a finished window, there 
are few changes made to FileSystemWAL this is because of following:

1. WindowDataManager needs to reply data of every finished window. This window 
may not be checkpointed. 
FileSystemWAL truncates the WAL file to the checkpointed point after recovery 
so this poses a problem. 
WindowDataManager should be able to control recovery of FileSystemWAL.

2.  FileSystemWAL writes to temporary files. The mapping of temp files to 
actual file is part of its state which is checkpointed. Since WindowDataManager 
replays data of a window not yet checkpointed, it needs to know the actual 
temporary file the data is being persisted to.

At a high level, WindowDataManager will persist meta information on file system 
which includes 
- for every window 






  was:
FS Window Data Manager is used to save meta-data that helps in replaying tuples 
every completed application window after failure. For this it saves meta-data 
in a file per window. Having multiple small size files on hdfs cause issues as 
highlighted here:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

Instead FS Window Data Manager can utilize the WAL to write data and maintain a 
mapping of how much data was flushed to WAL each window.


> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
>                 Key: APEXMALHAR-2063
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying 
> tuples every completed application window after failure. For this it saves 
> meta-data in a file per window. Having multiple small size files on hdfs 
> cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain 
> a mapping of how much data was flushed to WAL each window. 
> In order to use FileSystemWAL for replaying data of a finished window, there 
> are few changes made to FileSystemWAL this is because of following:
> 1. WindowDataManager needs to reply data of every finished window. This 
> window may not be checkpointed. 
> FileSystemWAL truncates the WAL file to the checkpointed point after recovery 
> so this poses a problem. 
> WindowDataManager should be able to control recovery of FileSystemWAL.
> 2.  FileSystemWAL writes to temporary files. The mapping of temp files to 
> actual file is part of its state which is checkpointed. Since 
> WindowDataManager replays data of a window not yet checkpointed, it needs to 
> know the actual temporary file the data is being persisted to.
> At a high level, WindowDataManager will persist meta information on file 
> system which includes 
> - for every window 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to