[
https://issues.apache.org/jira/browse/DRILL-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934470#comment-14934470
]
Rahul Challapalli commented on DRILL-3846:
------------------------------------------
Updating the priority to critical as I am seeing performance degradation with
other types of queries as well
Without Metadata Caching :
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0 |
+---------+
| -21 |
+---------+
1 row selected (11.371 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from
`complex_sparse_50000files` a group by
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0 |
+---------+
| 257 |
+---------+
1 row selected (67.666 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select sum ( distinct
cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0 |
+---------+
| -128 |
+---------+
1 row selected (69.016 seconds)
With Caching :
0: jdbc:drill:zk=10.10.100.190:5181> select
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0 |
+---------+
| -21 |
+---------+
1 row selected (53.821 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from
`complex_sparse_50000files` a group by
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0 |
+---------+
| 257 |
+---------+
1 row selected (119.584 seconds)
select sum ( distinct
cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0 |
+---------+
| -128 |
+---------+
1 row selected (133.967 seconds)
{code}
With Metadata Caching :
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0 |
+---------+
| -21 |
+---------+
1 row selected (11.371 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from
`complex_sparse_50000files` a group by
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0 |
+---------+
| 257 |
+---------+
1 row selected (67.666 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select sum ( distinct
cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0 |
+---------+
| -128 |
+---------+
1 row selected (69.016 seconds)
With Caching :
0: jdbc:drill:zk=10.10.100.190:5181> select
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0 |
+---------+
| -21 |
+---------+
1 row selected (53.821 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from
`complex_sparse_50000files` a group by
a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0 |
+---------+
| 257 |
+---------+
1 row selected (119.584 seconds)
select sum ( distinct
cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0 |
+---------+
| -128 |
+---------+
1 row selected (133.967 seconds)
{code}
> Metadata Caching : A count(*) query took more time with the cache in place
> --------------------------------------------------------------------------
>
> Key: DRILL-3846
> URL: https://issues.apache.org/jira/browse/DRILL-3846
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata
> Reporter: Rahul Challapalli
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> I have a folder with 10k complex files. The generated cache file is around
> 486 MB. The below numbers indicate that we regressed in terms of performance
> when we generated the metadata cache
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from
> `complex_sparse_50000files`;
> +----------+
> | EXPR$0 |
> +----------+
> | 1000000 |
> +----------+
> 1 row selected (30.835 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata
> `complex_sparse_50000files`;
> +-------+---------------------------------------------------------------------+
> | ok | summary
> |
> +-------+---------------------------------------------------------------------+
> | true | Successfully updated metadata for table complex_sparse_50000files.
> |
> +-------+---------------------------------------------------------------------+
> 1 row selected (10.69 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from
> `complex_sparse_50000files`;
> +----------+
> | EXPR$0 |
> +----------+
> | 1000000 |
> +----------+
> 1 row selected (47.614 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)