[
https://issues.apache.org/jira/browse/TAJO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632745#comment-14632745
]
ASF GitHub Bot commented on TAJO-1343:
--------------------------------------
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/634#issuecomment-122641665
I've tested this patch on my laptop using YourKit.
Here are some highlights of the results.
### Query and data
I used the following query on TPC-H data set with the scale factor of 1.
```
default> select count(distinct l_partkey) from lineitem;
```
### 1. Tuple creation
The following numbers were randomly captured during executing the second
phase of distinct aggregation. So, it would not be the exact comparison, but is
valuable I think.
#### Master
| Class | Objects | Shallow Size | Retained Size |
| ------------- |:-------------:| ------------------:| -------------------:|
|org.apache.tajo.storage.VTuple | 3,899,275 | 93,582,600 | 112,627,520 |
|org.apache.tajo.datum.Datum[] | 3,899,275 | 87,335,008 | 96,857,392 |
#### Tajo-1343
| Class | Objects | Shallow Size | Retained Size |
| ------------- |:-------------:| ------------------:| -------------------:|
|org.apache.tajo.engine.planner.physical.KeyTuple | 2,573,412 | 82,349,184
| 95,913,472 |
|org.apache.tajo.storage.VTuple | 398,023 | 9,552,552 | 9,552,984 |
|org.apache.tajo.datum.Datum[] | 2,971,439 | 71,314,440 | 78,096,672 |
As can be seen in the above result, the total size of generated tuples in
TAJO-1343 is less than that in master. The difference is 7,161,064.
### 2. Memory usage
The following graphs show the changes of memory usage during query
execution.
##### < Memory usage of with master >

##### < Memory usage of with TAJO-1343 >

The gray part in each graph represents the change of memory usage during
query execution.
As can be seen in the graphs, memory usage change with TAJO-1343 is more
gracefully than that with master.
The following numbers were captured when the query is finished because it's
the time when the numbers become the maximum.
| branch name | Allocated all pools | Used PS Eden Space | Used
PS Survivor Space | Used PS Old Gen | # of GCs |
| ------------- | ------------- |:-------------:| ------------------:|
-------------------:| -------------------:|
| master | 1006 | 634 | 22 | 63 | 4 |
| TAJO-1343 | 1002 | 518 | 43 | 71 | 3 |
The amount of ```allocated all pools``` with TAJO-1343 is similar to that
of master. However, ```used PS eden space``` with TAJO-1343 is about 100 MB
less than that with master. This means that the less objects are newly created
during query execution.
In this test, the numbers of GCs with both branches are similar, so it is
of less significance. However, I think that the difference in number of GCs
will be increased with more complex queries, thereby more significantly
affecting to the query performance.
> Improve the memory usage of physical executors
> ----------------------------------------------
>
> Key: TAJO-1343
> URL: https://issues.apache.org/jira/browse/TAJO-1343
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Priority: Critical
> Fix For: 0.11.0
>
> Attachments: 1343-memory.png, master-memory.png
>
>
> *Introduction*
> Basically, the tuple instance is maintained as a singleton in physical
> operators. However, there are some memory-based operator types which need to
> keep multiple tuples in the memory. In these operators, multiple instances
> must be created for each tuple.
> *Problem*
> Currently, there are some temporal routines to avoid unexpected problems due
> to the singleton instance of tuple. However, the methodology is inconsistent
> and complex, which causes unexpected bugs.
> *Solution*
> A consistent methodology is needed to handle this problem. Only the operators
> that keep multiple tuples in memory must maintain those tuples with separate
> instances.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)