Michael J. Carey created ASTERIXDB-2624:
-------------------------------------------

             Summary: Double-ended temp files for connector buffering
                 Key: ASTERIXDB-2624
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2624
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: *DB - AsterixDB, HYR - Hyracks, RT - Runtime
            Reporter: Michael J. Carey
            Assignee: Till


Currently, some of the Hyracks connectors persist data as well as passing it on 
- either synchronously (persist and then pass it on) or asynchronously (persist 
but also pass it on).  The goal there is to decouple the senders and receivers 
so that senders are not held y slow receivers in terms of being able to finish 
their work.  Temp files are used for this purpose.  The problem with that is 
that this means that, by the end of a stage that involves such a connector, 
that all of the bits that were moved are also filed in temp files.

In the event senders and receivers are fairly in sync, it would be nice for 
consumed data to be able to be garbage-collected - i.e., for such a connector 
to only have as much intermediate state in it as needed due to however far 
ahead of the sender the receiver has gotten.  A "double-ended file" - 
essentially a persistent queue - if such a beast existed - would meet this 
requirement.

It would be cool to build such a utility for use in Hyracks to address this 
need.  One could imagine doing this as a set of small temp files - the unit of 
creation/collection of intermediate data - kind of like we do for logs.

The big benefit of this would be SPACE - less pressure on the file system.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to