Alexander Ocsa created ARROW-14330:
--------------------------------------
Summary: Create DataHolder that can be used for caching during
exec plans
Key: ARROW-14330
URL: https://issues.apache.org/jira/browse/ARROW-14330
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Affects Versions: 7.0.0
Reporter: Alexander Ocsa
Assignee: Alexander Ocsa
The purpose of this task is to make an ExecNode that can provide the following
functionality.
# Be able to obtain heuristics about our memory consumption and have a memory
consumption threshold
# Be able to write incoming ExecBatch to disk if memory consumption is above
the threshold, stores either the ExecBatch or a handle to file in a queue.
# Provide an api for pulling an ExecBatch from the queue. It should favor
pulling all of the batches that are in memory first and then the ones that are
handles to files.
PRs to reference
[https://github.com/apache/arrow/pull/11017/]
Discusssion around the subject (these are just for thoughts, these documents
are not decisions)
[https://www.notion.so/voltrondata/Caches-and-Cache-Data-d8822213fec5402aa691ca76912a5b3d#ca60341a3e3f4b5487f146f545b19b2c]
[https://docs.google.com/document/d/15X0ePnVJqDmT7og1seikZ31Zmtd3KRyJbL33jym5G3A/edit#]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)