Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/634#issuecomment-122641665
I've tested this patch on my laptop using YourKit.
Here are some highlights of the results.
### Query and data
I used the following query on TPC-H data set with the scale factor of 1.
```
default> select count(distinct l_partkey) from lineitem;
```
### 1. Tuple creation
The following numbers were randomly captured during executing the second
phase of distinct aggregation. So, it would not be the exact comparison, but is
valuable I think.
#### Master
| Class | Objects | Shallow Size | Retained Size |
| ------------- |:-------------:| ------------------:| -------------------:|
|org.apache.tajo.storage.VTuple | 3,899,275 | 93,582,600 | 112,627,520 |
|org.apache.tajo.datum.Datum[] | 3,899,275 | 87,335,008 | 96,857,392 |
#### Tajo-1343
| Class | Objects | Shallow Size | Retained Size |
| ------------- |:-------------:| ------------------:| -------------------:|
|org.apache.tajo.engine.planner.physical.KeyTuple | 2,573,412 | 82,349,184
| 95,913,472 |
|org.apache.tajo.storage.VTuple | 398,023 | 9,552,552 | 9,552,984 |
|org.apache.tajo.datum.Datum[] | 2,971,439 | 71,314,440 | 78,096,672 |
As can be seen in the above result, the total size of generated tuples in
TAJO-1343 is less than that in master. The difference is 7,161,064.
### 2. Memory usage
The following graphs show the changes of memory usage during query
execution.
##### < Memory usage of with master >

##### < Memory usage of with TAJO-1343 >

The gray part in each graph represents the change of memory usage during
query execution.
As can be seen in the graphs, memory usage change with TAJO-1343 is more
gracefully than that with master.
The following numbers were captured when the query is finished because it's
the time when the numbers become the maximum.
| branch name | Allocated all pools | Used PS Eden Space | Used
PS Survivor Space | Used PS Old Gen | # of GCs |
| ------------- | ------------- |:-------------:| ------------------:|
-------------------:| -------------------:|
| master | 1006 | 634 | 22 | 63 | 4 |
| TAJO-1343 | 1002 | 518 | 43 | 71 | 3 |
The amount of ```allocated all pools``` with TAJO-1343 is similar to that
of master. However, ```used PS eden space``` with TAJO-1343 is about 100 MB
less than that with master. This means that the less objects are newly created
during query execution.
In this test, the numbers of GCs with both branches are similar, so it is
of less significance. However, I think that the difference in number of GCs
will be increased with more complex queries, thereby more significantly
affecting to the query performance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---