[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948216#comment-15948216
 ] 

ASF GitHub Bot commented on DRILL-5394:
---------------------------------------

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/802#discussion_r108822233
  
    --- Diff: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java
 ---
    @@ -115,8 +112,11 @@ private void init() {
         try (Admin admin = formatPlugin.getConnection().getAdmin();
              RegionLocator locator = 
formatPlugin.getConnection().getRegionLocator(tableName)) {
           hTableDesc = admin.getTableDescriptor(tableName);
    -      tableStats = new MapRDBTableStats(getHBaseConf(), 
hbaseScanSpec.getTableName());
    -
    +      // Fetch rowCount only once and cache it in hbaseScanSpec.
    +      if (hbaseScanSpec.getRowCount() == hbaseScanSpec.ROW_COUNT_UNKNOWN) {
    +        MapRDBTableStats tableStats = new MapRDBTableStats(getHBaseConf(), 
hbaseScanSpec.getTableName());
    +        hbaseScanSpec.setRowCount(tableStats.getNumRows());
    --- End diff --
    
    This looks weird. We create the TableStats relying on some information from 
the ScanSpec and then proceed to modify the same ScanSpec with the information 
retrieved from TableStats. Please look at below comment as well regarding Spec 
mutability.
    
    Should we instead just overload the MapRDBTableStats constructor to allow 
passing numRows - since that is what the existing constructor endup doing but 
makes a call to DB Client? So instead of populating the ScanSpec we populate 
the tableStats using this new constructor?


> Optimize query planning for MapR-DB tables by caching row counts
> ----------------------------------------------------------------
>
>                 Key: DRILL-5394
>                 URL: https://issues.apache.org/jira/browse/DRILL-5394
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization, Storage - MapRDB
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: Abhishek Girish
>            Assignee: Padma Penumarthy
>              Labels: MapR-DB-Binary
>             Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to