[
https://issues.apache.org/jira/browse/APEXMALHAR-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337491#comment-15337491
]
ASF GitHub Bot commented on APEXMALHAR-2026:
--------------------------------------------
Github user tweise commented on a diff in the pull request:
https://github.com/apache/apex-malhar/pull/319#discussion_r67594736
--- Diff:
stream/src/main/java/org/apache/apex/malhar/stream/window/impl/DefaultWindowedStorageImpl.java
---
@@ -0,0 +1,120 @@
+package org.apache.apex.malhar.stream.window.impl;
+
+import com.datatorrent.api.StreamCodec;
+import org.apache.apex.malhar.stream.window.Window;
+import org.apache.apex.malhar.stream.window.WindowedStorage;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+import java.util.TreeSet;
+
+/**
+ * Created by david on 6/14/16.
+ */
+public class DefaultWindowedStorageImpl<K, V> implements
WindowedStorage<K, V>
--- End diff --
In the spirit of designing an interface that allows for efficient storage
operation with large state, can you consider
https://issues.apache.org/jira/browse/APEXMALHAR-2026
as well as managed state?
> Spill-able Datastructures
> -------------------------
>
> Key: APEXMALHAR-2026
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2026
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Timothy Farkas
> Assignee: Timothy Farkas
> Labels: roadmap
>
> Add libraryies for spooling datastructures to a key value store. There are
> several customer use cases which require spooled data structures.
> 1 - Some operators like AbstractFileInputOperator have ever growing state.
> This is an issue because eventually the state of the operator will grow
> larger than the memory allocated to the operator, which will cause the
> operator to perpetually fail. However if the operator's datastructures are
> spooled then the operator will never run out of memory.
> 2 - Some users have requested for the ability to maintain a map as well as a
> list of keys over which to iterate. Most key value stores don't provide this
> functionality. However, with spooled datastructures this functionality can be
> provided by maintaining a spooled map and an iterable set of keys.
> 3 - Some users have requested building graph databases within APEX. This
> would require implementing a spooled graph data structure.
> 4 - Another use case for spooled data structures is database operators.
> Database operators need to write data to a data base, but sometimes the
> database is down. In this case most of the database operators repeatedly fail
> until the database comes back up. In order to avoid constant failures the
> database operator need to writes data to a queue when the data base is down,
> then when the database is up the operator need to take data from the queue
> and write it to the database. In the case of a database failure this queue
> will grow larger than the total amount of memory available to the operator,
> so the queue should be spooled in order to prevent the operator from failing.
> 5 - Any operator which needs to maintain a large data structure in memory
> currently needs to have that data serialized and written out to HDFS with
> every checkpoint. This is costly when the data structure is large. If the
> data structure is spooled, then only the changes to the data structure are
> written out to HDFS instead of the entire data structure.
> 6 - Also building an Apex Native database for aggregations requires indices.
> These indices need to take the form of spooled data structures.
> 7 - In the future any operator which needs to maintain a data structure
> larger than the memory available to it will need to spool the data structure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)