[ 
https://issues.apache.org/jira/browse/TAJO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632745#comment-14632745
 ] 

ASF GitHub Bot commented on TAJO-1343:
--------------------------------------

Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-122641665
  
    I've tested this patch on my laptop using YourKit.
    Here are some highlights of the results.
    
    ### Query and data
    I used the following query on TPC-H data set with the scale factor of 1.
    ```
    default> select count(distinct l_partkey) from lineitem;
    ```
    
    ### 1. Tuple creation
    The following numbers were randomly captured during executing the second 
phase of distinct aggregation. So, it would not be the exact comparison, but is 
valuable I think.
    
    #### Master
    | Class        | Objects    | Shallow Size  | Retained Size |
    | ------------- |:-------------:| ------------------:| -------------------:|
    |org.apache.tajo.storage.VTuple | 3,899,275 | 93,582,600 | 112,627,520 |
    |org.apache.tajo.datum.Datum[] | 3,899,275 | 87,335,008 | 96,857,392 |
    
    #### Tajo-1343
    | Class        | Objects    | Shallow Size  | Retained Size |
    | ------------- |:-------------:| ------------------:| -------------------:|
    |org.apache.tajo.engine.planner.physical.KeyTuple | 2,573,412 | 82,349,184 
| 95,913,472 |
    |org.apache.tajo.storage.VTuple | 398,023 | 9,552,552 | 9,552,984 |
    |org.apache.tajo.datum.Datum[] | 2,971,439 | 71,314,440 | 78,096,672 |
    
    As can be seen in the above result, the total size of generated tuples in 
TAJO-1343 is less than that in master. The difference is 7,161,064.
    
    ### 2. Memory usage
    The following graphs show the changes of memory usage during query 
execution.
    
    ##### < Memory usage of with master >
    ![memory usage of with 
master](https://issues.apache.org/jira/secure/attachment/12745995/master-memory.png)
    ##### < Memory usage of with TAJO-1343 >
    ![memory usage of with 
TAJO-1343](https://issues.apache.org/jira/secure/attachment/12745994/1343-memory.png)
    
    The gray part in each graph represents the change of memory usage during 
query execution.
    As can be seen in the graphs, memory usage change with TAJO-1343 is more 
gracefully than that with master.
    
    The following numbers were captured when the query is finished because it's 
the time when the numbers become the maximum. 
    
    | branch name | Allocated all pools        | Used PS Eden Space    | Used 
PS Survivor Space  | Used PS Old Gen | # of GCs |
    | ------------- | ------------- |:-------------:| ------------------:| 
-------------------:| -------------------:|
    | master | 1006 | 634 | 22 | 63 | 4 |
    | TAJO-1343 | 1002 | 518 | 43 | 71 | 3 |
    
    The amount of ```allocated all pools``` with TAJO-1343 is similar to that 
of master. However, ```used PS eden space``` with TAJO-1343 is about 100 MB 
less than that with master. This means that the less objects are newly created 
during query execution. 
    
    In this test, the numbers of GCs with both branches are similar, so it is 
of less significance. However, I think that the difference in number of GCs 
will be increased with more complex queries, thereby more significantly 
affecting to the query performance.


> Improve the memory usage of physical executors
> ----------------------------------------------
>
>                 Key: TAJO-1343
>                 URL: https://issues.apache.org/jira/browse/TAJO-1343
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>            Priority: Critical
>             Fix For: 0.11.0
>
>         Attachments: 1343-memory.png, master-memory.png
>
>
> *Introduction*
> Basically, the tuple instance is maintained as a singleton in physical 
> operators. However, there are some memory-based operator types which need to 
> keep multiple tuples in the memory. In these operators, multiple instances 
> must be created for each tuple.
> *Problem*
> Currently, there are some temporal routines to avoid unexpected problems due 
> to the singleton instance of tuple. However, the methodology is inconsistent 
> and complex, which causes unexpected bugs.
> *Solution*
> A consistent methodology is needed to handle this problem. Only the operators 
> that keep multiple tuples in memory must maintain those tuples with separate 
> instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to