[ 
https://issues.apache.org/jira/browse/DRILL-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955517#comment-14955517
 ] 

Aman Sinha commented on DRILL-3929:
-----------------------------------

Yes, I was going to ask on the mailing list about the Phoenix work .. I can 
look at the Github link but do you have a design doc that you can point me to ? 
At a high level my thought was to do this mostly within Drill i.e without 
changing Calcite.  But I am open to other ways and will look at Phoenix.  
Assume creating elasticsearch index on Hbase or MaprDB tables and using the 
ElasticSearch storage plugin that Andrew contributed.   However, any index 
should be pluggable as long as the plugin implements a certain interface. 
Here's the Original Plan:
{code}
                      Filter 
                        |
                   HBaseGroupScan
{code}

Using appropriate APIs on the underlying DB, get the list of columns that are 
indexed. Do the filter analysis for index columns that is similar to doing the 
filter analysis for partitioning columns.  Generate Modified Plan: 
{code}
                   Filter (original filter minus the filters on index cols)
                      |
                   Rowkey based Join (initially this could be HashJoin)
                 /      \
                /        HBaseGroupScan
            Filter
        (with index cols only)
              |
         ElasticSearchGroupScan
{code}

There are some details:  
 (a) the row keys retrieved from the ElasticSearch index must be passed down to 
the HBaseGroupScan to ensure a restricted scan is done.  
 (b) Some mechanism to switch to a full-table scan instead of the index based 
lookup if the number of row keys is large enough (beyond a certain threshold of 
the DB size).   Volcano planner has the concept of ChoosePlan operator for 
this.  Will have to think about how this fits into Drill/Calcite.  
 (c) The modified plan will be further optimized by the ElasticSearch storage 
optimizer rule where the Filter with index columns will be pushed into the 
ESGroupScan.
       
       
      
        

> Support the ability to query database tables using external indices           
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3929
>                 URL: https://issues.apache.org/jira/browse/DRILL-3929
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Relational Operators, Query Planning & 
> Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> This is a placeholder for adding support in Drill to query database tables 
> using external indices.  I will add more details about the use case and a 
> preliminary design proposal.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to