[
https://issues.apache.org/jira/browse/TAJO-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi reassigned TAJO-900:
---------------------------------
Assignee: Hyunsik Choi
> Reducing memory usage during query processing
> ---------------------------------------------
>
> Key: TAJO-900
> URL: https://issues.apache.org/jira/browse/TAJO-900
> Project: Tajo
> Issue Type: Bug
> Components: physical operator, storage
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.9.0
>
>
> Currently, we have used tuple structures implemented as Java objects. It
> internally uses Datum objects. Current Tuple structure occupies in JVM heap
> space. As a result, it is hard to control memory usage, and it is impossible
> to predict garbage collection. This problem usually becomes severe when Tajo
> deals with very large data in relatively small cluster and lots of grouping
> or join keys.
> I've tried various tests and I made some prototype to show the possibility to
> eliminate this problem.
> The main idea is as follows:
> * Do not use Datum class in expression evaluation. Instead, we should use
> java primitive type values
> ** It will significantly reduces object creations and memory usages
> * Redesign Tuple using direct memory allocation (DirectByteBuffer or
> Unsafe.allocateMemory)
> ** It allows each worker to control memory usages during in-memory
> operations like sort and hash aggregation/joins.
> ** It enables column values to be stored in adjacent memory, improving
> cache locality.
> In order to achieve the above idea, we should do as follows:
> * implement an alternative to EvalNode framework
> * Design new tuple data structure using direct memory allocation
> * Refactor in-memory sort, hash aggregation, hash join operators
> This is an umbrella issue. I'll create subtasks, and I've already started
> some issues. I'll use this jira to track them.
--
This message was sent by Atlassian JIRA
(v6.2#6252)