[jira] [Commented] (APEXMALHAR-2026) Spill-able Datastructures

ASF GitHub Bot (JIRA) Fri, 17 Jun 2016 20:41:25 -0700

    [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337491#comment-15337491
 ]


ASF GitHub Bot commented on APEXMALHAR-2026:
--------------------------------------------

Github user tweise commented on a diff in the pull request:

    https://github.com/apache/apex-malhar/pull/319#discussion_r67594736
  
    --- Diff: 
stream/src/main/java/org/apache/apex/malhar/stream/window/impl/DefaultWindowedStorageImpl.java
 ---
    @@ -0,0 +1,120 @@
    +package org.apache.apex.malhar.stream.window.impl;
    +
    +import com.datatorrent.api.StreamCodec;
    +import org.apache.apex.malhar.stream.window.Window;
    +import org.apache.apex.malhar.stream.window.WindowedStorage;
    +
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +import java.util.TreeSet;
    +
    +/**
    + * Created by david on 6/14/16.
    + */
    +public class DefaultWindowedStorageImpl<K, V> implements 
WindowedStorage<K, V>
    --- End diff --
    
    In the spirit of designing an interface that allows for efficient storage 
operation with large state, can you consider 
    https://issues.apache.org/jira/browse/APEXMALHAR-2026
    as well as managed state?


> Spill-able Datastructures
> -------------------------
>
>                 Key: APEXMALHAR-2026
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2026
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>              Labels: roadmap
>
> Add libraryies for spooling datastructures to a key value store. There are 
> several customer use cases which require spooled data structures.
> 1 - Some operators like AbstractFileInputOperator have ever growing state. 
> This is an issue because eventually the state of the operator will grow 
> larger than the memory allocated to the operator, which will cause the 
> operator to perpetually fail. However if the operator's datastructures are 
> spooled then the operator will never run out of memory.
> 2 - Some users have requested for the ability to maintain a map as well as a 
> list of keys over which to iterate. Most key value stores don't provide this 
> functionality. However, with spooled datastructures this functionality can be 
> provided by maintaining a spooled map and an iterable set of keys.
> 3 - Some users have requested building graph databases within APEX. This 
> would require implementing a spooled graph data structure.
> 4 - Another use case for spooled data structures is database operators. 
> Database operators need to write data to a data base, but sometimes the 
> database is down. In this case most of the database operators repeatedly fail 
> until the database comes back up. In order to avoid constant failures the 
> database operator need to writes data to a queue when the data base is down, 
> then when the database is up the operator need to take data from the queue 
> and write it to the database. In the case of a database failure this queue 
> will grow larger than the total amount of memory available to the operator, 
> so the queue should be spooled in order to prevent the operator from failing.
> 5 - Any operator which needs to maintain a large data structure in memory 
> currently needs to have that data serialized and written out to HDFS with 
> every checkpoint. This is costly when the data structure is large. If the 
> data structure is spooled, then only the changes to the data structure are 
> written out to HDFS instead of the entire data structure.
> 6 - Also building an Apex Native database for aggregations requires indices. 
> These indices need to take the form of spooled data structures.
> 7 - In the future any operator which needs to maintain a data structure 
> larger than the memory available to it will need to spool the data structure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (APEXMALHAR-2026) Spill-able Datastructures

Reply via email to