[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301685#comment-15301685
 ] 

Chandni Singh commented on APEXMALHAR-2063:
-------------------------------------------

Window Data Manager supports dynamic partitioning in operator by allowing an 
instance to read data saved for a particular window id by all operator 
instances. The following method provides that support
{code}
 /**
   * When an operator can partition itself dynamically then there is no 
guarantee that an input state which was being
   * handled by one instance previously will be handled by the same instance 
after partitioning. <br/>
   * For eg. An {@link AbstractFileInputOperator} instance which reads a File X 
till offset l (not check-pointed) may no
   * longer be the instance that handles file X after repartitioning as no. of 
instances may have changed and file X
   * is re-hashed to another instance. <br/>
   * The new instance wouldn't know from what point to read the File X unless 
it reads the idempotent storage of all the
   * operators for the window being replayed and fix it's state.
   *
   * @param windowId window id.
   * @return mapping of operator id to the corresponding state
   * @throws IOException
   */
  Map<Integer, Object> load(long windowId) throws IOException;
{code}
To provide the support for above with FileSystemWAL becomes complicated. 
Currently the FileSystemWAL reader and writer are assumed to be in the same 
physical partition. However, supporting above requires multiple readers which 
are in different physical partitions than the writer.

So the FileSystem WAL needs to be changed, in order to be used in read-only 
mode.

> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
>                 Key: APEXMALHAR-2063
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying 
> tuples every completed application window after failure. For this it saves 
> meta-data in a file per window. Having multiple small size files on hdfs 
> cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain 
> a mapping of how much data was flushed to WAL each window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to