[ https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450119#comment-15450119 ]
David Yan commented on APEXMALHAR-2130: --------------------------------------- [~timothyfarkas] Yes, in theory we can do what you said with two spillable data structures, with a LinkedList instead of an ArrayList, but it's not ideal since TFile and DTFile already support returning an iterator that iterators over entries that are greater than or equal to a given key and we should make use of that to get the list of keys given a window on an equivalence of Map<Pair<Window, K>, V>. We just need to expose that functionality in managed state and assign the timebucket based on the event Window. > implement scalable windowed storage > ----------------------------------- > > Key: APEXMALHAR-2130 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130 > Project: Apache Apex Malhar > Issue Type: Task > Reporter: bright chen > Assignee: David Yan > > This feature is used for supporting windowing. > The storage needs to have the following features: > 1. Spillable key value storage (integrate with APEXMALHAR-2026) > 2. Upon checkpoint, it saves a snapshot for the entire data set with the > checkpointing window id. This should be done incrementally (ManagedState) to > avoid wasting space with unchanged data > 3. When recovering, it takes the recovery window id and restores to that > snapshot > 4. When a window is committed, all windows with a lower ID should be purged > from the store. > 5. It should implement the WindowedStorage and WindowedKeyedStorage > interfaces, and because of 2 and 3, we may want to add methods to the > WindowedStorage interface so that the implementation of WindowedOperator can > notify the storage of checkpointing, recovering and committing of a window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)