[ 
https://issues.apache.org/jira/browse/TAJO-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated TAJO-900:
------------------------------

    Description: 
Currently, we have used tuple structures implemented as Java objects. It 
internally uses Datum objects. Current Tuple structure occupies in JVM heap 
space. As a result, it is hard to control memory usage, and it is impossible to 
predict garbage collection. This problem usually becomes severe when Tajo deals 
with very large data in relatively small cluster and lots of grouping or join 
keys.

I've tried various tests and I made some prototype to show the possibility to 
eliminate this problem. 

The main idea is as follows:
 * Do not use Datum class in expression evaluation. Instead, we should use java 
primitive type values
   ** It will significantly reduces object creations and memory usages
 * Redesign Tuple using direct memory allocation (DirectByteBuffer or 
Unsafe.allocateMemory)
   ** It allows each worker to control memory usages during in-memory 
operations like sort and hash aggregation/joins.
   ** It enables column values to be stored in adjacent memory, improving cache 
locality.

In order to achieve the above idea, we should do as follows:
 * implement an alternative (i.e., runtime byte code generation) to EvalNode 
framework in order to avoid use of Datum
 * Design new tuple data structure using direct memory allocation 
 * Refactor in-memory sort, hash aggregation, hash join operators

This is an umbrella issue. I'll create subtasks, and I've already started some 
issues. I'll use this jira to track them.

  was:
Currently, we have used tuple structures implemented as Java objects. It 
internally uses Datum objects. Current Tuple structure occupies in JVM heap 
space. As a result, it is hard to control memory usage, and it is impossible to 
predict garbage collection. This problem usually becomes severe when Tajo deals 
with very large data in relatively small cluster and lots of grouping or join 
keys.

I've tried various tests and I made some prototype to show the possibility to 
eliminate this problem. 

The main idea is as follows:
 * Do not use Datum class in expression evaluation. Instead, we should use java 
primitive type values
   ** It will significantly reduces object creations and memory usages
 * Redesign Tuple using direct memory allocation (DirectByteBuffer or 
Unsafe.allocateMemory)
   ** It allows each worker to control memory usages during in-memory 
operations like sort and hash aggregation/joins.
   ** It enables column values to be stored in adjacent memory, improving cache 
locality.

In order to achieve the above idea, we should do as follows:
 * implement an alternative (i.e., runtime byte code generation) to EvalNode 
framework
 * Design new tuple data structure using direct memory allocation 
 * Refactor in-memory sort, hash aggregation, hash join operators

This is an umbrella issue. I'll create subtasks, and I've already started some 
issues. I'll use this jira to track them.


> Reducing memory usage during query processing
> ---------------------------------------------
>
>                 Key: TAJO-900
>                 URL: https://issues.apache.org/jira/browse/TAJO-900
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator, storage
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.9.0
>
>
> Currently, we have used tuple structures implemented as Java objects. It 
> internally uses Datum objects. Current Tuple structure occupies in JVM heap 
> space. As a result, it is hard to control memory usage, and it is impossible 
> to predict garbage collection. This problem usually becomes severe when Tajo 
> deals with very large data in relatively small cluster and lots of grouping 
> or join keys.
> I've tried various tests and I made some prototype to show the possibility to 
> eliminate this problem. 
> The main idea is as follows:
>  * Do not use Datum class in expression evaluation. Instead, we should use 
> java primitive type values
>    ** It will significantly reduces object creations and memory usages
>  * Redesign Tuple using direct memory allocation (DirectByteBuffer or 
> Unsafe.allocateMemory)
>    ** It allows each worker to control memory usages during in-memory 
> operations like sort and hash aggregation/joins.
>    ** It enables column values to be stored in adjacent memory, improving 
> cache locality.
> In order to achieve the above idea, we should do as follows:
>  * implement an alternative (i.e., runtime byte code generation) to EvalNode 
> framework in order to avoid use of Datum
>  * Design new tuple data structure using direct memory allocation 
>  * Refactor in-memory sort, hash aggregation, hash join operators
> This is an umbrella issue. I'll create subtasks, and I've already started 
> some issues. I'll use this jira to track them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to