[
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450119#comment-15450119
]
David Yan commented on APEXMALHAR-2130:
---------------------------------------
[~timothyfarkas] Yes, in theory we can do what you said with two spillable data
structures, with a LinkedList instead of an ArrayList, but it's not ideal since
TFile and DTFile already support returning an iterator that iterators over
entries that are greater than or equal to a given key and we should make use of
that to get the list of keys given a window on an equivalence of
Map<Pair<Window, K>, V>. We just need to expose that functionality in managed
state and assign the timebucket based on the event Window.
> implement scalable windowed storage
> -----------------------------------
>
> Key: APEXMALHAR-2130
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
> Project: Apache Apex Malhar
> Issue Type: Task
> Reporter: bright chen
> Assignee: David Yan
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the
> checkpointing window id. This should be done incrementally (ManagedState) to
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage
> interfaces, and because of 2 and 3, we may want to add methods to the
> WindowedStorage interface so that the implementation of WindowedOperator can
> notify the storage of checkpointing, recovering and committing of a window.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)