Michael J. Carey created ASTERIXDB-2624:
-------------------------------------------
Summary: Double-ended temp files for connector buffering
Key: ASTERIXDB-2624
URL: https://issues.apache.org/jira/browse/ASTERIXDB-2624
Project: Apache AsterixDB
Issue Type: Improvement
Components: *DB - AsterixDB, HYR - Hyracks, RT - Runtime
Reporter: Michael J. Carey
Assignee: Till
Currently, some of the Hyracks connectors persist data as well as passing it on
- either synchronously (persist and then pass it on) or asynchronously (persist
but also pass it on). The goal there is to decouple the senders and receivers
so that senders are not held y slow receivers in terms of being able to finish
their work. Temp files are used for this purpose. The problem with that is
that this means that, by the end of a stage that involves such a connector,
that all of the bits that were moved are also filed in temp files.
In the event senders and receivers are fairly in sync, it would be nice for
consumed data to be able to be garbage-collected - i.e., for such a connector
to only have as much intermediate state in it as needed due to however far
ahead of the sender the receiver has gotten. A "double-ended file" -
essentially a persistent queue - if such a beast existed - would meet this
requirement.
It would be cool to build such a utility for use in Hyracks to address this
need. One could imagine doing this as a set of small temp files - the unit of
creation/collection of intermediate data - kind of like we do for logs.
The big benefit of this would be SPACE - less pressure on the file system.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)