[
https://issues.apache.org/jira/browse/TAJO-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360042#comment-14360042
]
Jinho Kim edited comment on TAJO-1383 at 3/13/15 7:32 AM:
----------------------------------------------------------
Github user jinossy commented on the pull request:
https://github.com/apache/tajo/pull/404#issuecomment-78846800
I was ran simple benchmark
* Cluster : 1Master + 4Worker
* TPC-H 100GB part of Q16
{code:sql}
select
p_brand, p_type, p_size, ps_suppkey
from
partsupp ps join part p
on
p.p_partkey = ps.ps_partkey and p.p_brand <> 'Brand#45'
and not p.p_type like 'MEDIUM POLISHED%'
join supplier_tmp s
on
ps.ps_suppkey = s.s_suppkey;
{code}
* This is only join benchmark (broadcast table is "supplier_tmp")
||Broadcast || execution time||
|false | 45 sec|
|true | 33 sec|
|improved | 22 sec|
was (Author: githubbot):
Github user jinossy commented on the pull request:
https://github.com/apache/tajo/pull/404#issuecomment-78846800
I was ran simple benchmark
* Cluster : 1Master + 4Worker
* TPC-H 100GB part of Q16
```
select
p_brand, p_type, p_size, ps_suppkey
from
partsupp ps join part p
on
p.p_partkey = ps.ps_partkey and p.p_brand <> 'Brand#45'
and not p.p_type like 'MEDIUM POLISHED%'
join supplier_tmp s
on
ps.ps_suppkey = s.s_suppkey;
```
* This is only join benchmark (broadcast table is "supplier_tmp")
Broadcast | execution time
------------ | -------------
false | 45 sec
true | 33 sec
improved | 22 sec
> Improve broadcast table cache
> -----------------------------
>
> Key: TAJO-1383
> URL: https://issues.apache.org/jira/browse/TAJO-1383
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Affects Versions: 0.8.0, 0.9.0, 0.10.0
> Reporter: Jinho Kim
> Assignee: Jinho Kim
> Labels: performance
> Attachments: TAJO-1383.patch
>
>
> Currently, broadcast implementation keep a tuples on scan operator and It
> create a duplicated table cache in memory.
> We should improve it
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)