[ 
https://issues.apache.org/jira/browse/TAJO-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated TAJO-900:
------------------------------

    Issue Type: Improvement  (was: Bug)

> Reducing memory usage during query processing
> ---------------------------------------------
>
>                 Key: TAJO-900
>                 URL: https://issues.apache.org/jira/browse/TAJO-900
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator, storage
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.9.0
>
>
> Currently, we have used tuple structures implemented as Java objects. It 
> internally uses Datum objects. Current Tuple structure occupies in JVM heap 
> space. As a result, it is hard to control memory usage, and it is impossible 
> to predict garbage collection. This problem usually becomes severe when Tajo 
> deals with very large data in relatively small cluster and lots of grouping 
> or join keys.
> I've tried various tests and I made some prototype to show the possibility to 
> eliminate this problem. 
> The main idea is as follows:
>  * Do not use Datum class in expression evaluation. Instead, we should use 
> java primitive type values
>    ** It will significantly reduces object creations and memory usages
>  * Redesign Tuple using direct memory allocation (DirectByteBuffer or 
> Unsafe.allocateMemory)
>    ** It allows each worker to control memory usages during in-memory 
> operations like sort and hash aggregation/joins.
>    ** It enables column values to be stored in adjacent memory, improving 
> cache locality.
> In order to achieve the above idea, we should do as follows:
>  * implement an alternative to EvalNode framework
>  * Design new tuple data structure using direct memory allocation 
>  * Refactor in-memory sort, hash aggregation, hash join operators
> This is an umbrella issue. I'll create subtasks, and I've already started 
> some issues. I'll use this jira to track them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to