[
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948216#comment-15948216
]
ASF GitHub Bot commented on DRILL-5394:
---------------------------------------
Github user gparai commented on a diff in the pull request:
https://github.com/apache/drill/pull/802#discussion_r108822233
--- Diff:
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java
---
@@ -115,8 +112,11 @@ private void init() {
try (Admin admin = formatPlugin.getConnection().getAdmin();
RegionLocator locator =
formatPlugin.getConnection().getRegionLocator(tableName)) {
hTableDesc = admin.getTableDescriptor(tableName);
- tableStats = new MapRDBTableStats(getHBaseConf(),
hbaseScanSpec.getTableName());
-
+ // Fetch rowCount only once and cache it in hbaseScanSpec.
+ if (hbaseScanSpec.getRowCount() == hbaseScanSpec.ROW_COUNT_UNKNOWN) {
+ MapRDBTableStats tableStats = new MapRDBTableStats(getHBaseConf(),
hbaseScanSpec.getTableName());
+ hbaseScanSpec.setRowCount(tableStats.getNumRows());
--- End diff --
This looks weird. We create the TableStats relying on some information from
the ScanSpec and then proceed to modify the same ScanSpec with the information
retrieved from TableStats. Please look at below comment as well regarding Spec
mutability.
Should we instead just overload the MapRDBTableStats constructor to allow
passing numRows - since that is what the existing constructor endup doing but
makes a call to DB Client? So instead of populating the ScanSpec we populate
the tableStats using this new constructor?
> Optimize query planning for MapR-DB tables by caching row counts
> ----------------------------------------------------------------
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
> Issue Type: Improvement
> Components: Query Planning & Optimization, Storage - MapRDB
> Affects Versions: 1.9.0, 1.10.0
> Reporter: Abhishek Girish
> Assignee: Padma Penumarthy
> Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was
> longer than expected. With DEBUG logs, it was understood that there were
> multiple calls being made to get MapR-DB region locations and to fetch total
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query
> planning. This should help reduce query planning time.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)