[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471393#comment-15471393
 ] 

Thomas Weise commented on APEXMALHAR-2130:
------------------------------------------

Which keys belong to which window can be seen as derived information when using 
a key in managed state that is <window><key>. We can then retrieve all keys for 
a given window by doing a prefix scan for <window>. This becomes difficult 
however when including the state that wasn't flushed to the data files yet 
(they are compacted asynchronously).  

Using another spillable datastructure to store the keys for each window is 
possible, but also comes with drawbacks, since it duplicates information. Once 
of them is obviously performance, since it is increasing the HDFS usage. It 
also needs to use the same store, as otherwise there is the situation of two 
resources that need to be updated atomically.

Assuming that both collections can use the same store (key prefix) and share 
the same WAL and data files, then we need to see whether time based purging 
will still work as intended. David?
  

> implement scalable windowed storage
> -----------------------------------
>
>                 Key: APEXMALHAR-2130
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: bright chen
>            Assignee: David Yan
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the 
> checkpointing window id.  This should be done incrementally (ManagedState) to 
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that 
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged 
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage 
> interfaces, and because of 2 and 3, we may want to add methods to the 
> WindowedStorage interface so that the implementation of WindowedOperator can 
> notify the storage of checkpointing, recovering and committing of a window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to