[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613710#comment-16613710
 ] 

Aman Sinha commented on DRILL-6381:
-----------------------------------

The index planning and execution feature has been in development in a private 
repository at MapR. We would like to contribute it to Apache Drill and hope 
that it spurs further development and adoption by the community. The main ideas 
are described below.

The feature is divided into two broad categories:
 # A comprehensive index planning and execution framework which supports 
distributed covering and non-covering indices. Index planning is done for WHERE 
clause filters, sort-based operation such as ORDER BY, GROUP BY (using 
StreamingAggregate) and joins (using MergeJoin). The framework is intended to 
be agnostic to the storage plugins. It provides a clean abstraction layer that 
allows the Drill planner and executor to work with only core Drill artifacts 
while storage plugins provide concrete implementations of the interfaces.
 # A reference implementation with MapR-DB JSON plugin whose backend supports 
secondary indexing. Other similar DB plugins whose backend supports secondary 
indices could potentially use the reference implementation as a guide.

Note that Drill is a query engine; it does not provide CRUD operations either 
on the primary table or indexes. These are assumed to be maintained by the 
respective backend servers to which Drill communicates via storage/format 
plugins.

+*Key design concepts:*+

*Covering index:* 

An index whose index fields plus non-index fields (a.k.a included fields) 
'covers' all the columns referenced in the query. The Drill planner will 
generate a covering index plan (a.k.a index-only) plan where all the columns 
are retrieved from the index after pushing down relevant filter conditions to 
the index scan.

*Non-covering index:* 

An index whose index fields plus included fields only partially covers columns 
referenced in the query. For instance, suppose the index is created as follows: 
index keys:\{a, b}, included fields: \{c} and the query is SELECT d, e FROM T 
WHERE a > 10 AND b < 20. In this case, since columns d, e are not present in 
the index at all, this is a non-covering index. For such indexes, the Drill 
planner will generate a non-covering plan where only the row ids are fetched 
from the index by pushing down the WHERE clause filters and the rest of the 
columns are fetched after a join-back to the primary table. The join-back is 
performed using the row ids. A related notion is that of Global index: Drill 
planner assumes indexes are global in nature, i.e the index blocks are not 
necessarily co-located with the primary table's data blocks. This is the most 
general case since an index may be quite large and in order to fully utilize 
the cluster resources it is best to have it fully distributed.

*Functional index:* 

An index which is created not on the base table columns but on 
functions/expressions. Currently, only CAST functions have been tested since 
these are most commonly used in Drill views. If the filter condition is 'WHERE 
CAST(zip_code as BIGINT) = 95120 and a functional index exists on CAST(zip_code 
as BIGINT), then the Drill planner will leverage such indexes as long as they 
are exposed through appropriate metadata interfaces.

*Range partitioning:* 

This is applicable for non-covering indexes. Since the index is typically 
distributed across multiple nodes in the cluster, once we retrieve the row ids 
from a particular node, we have to send each row id to the appropriate 
destination node which contains primary table data for that row. This is done 
through range partitioning. The Drill executor contains a special operator 
(RangePartitionRecordBatch) and a corresponding exchange 
(RangePartitionExchange) for this purpose. For example, suppose the filter 
condition is WHERE state IN ('CA', 'TX', NY'). Since the primary table data for 
these states may be spread out over multiple nodes, once the row ids are 
fetched from the index, they have to be grouped into separate 'ranges' such 
that co-located row ids can be sent to the same node.

*Rowkey Join and Restricted Scan (skip scan):* 

A Rowkey (a.k.a rowid) join is used whenever we need to fetch the rows from 
primary table for a set of row ids retrieved from the index. Note that this is 
not a real join but more of a lookup based on rowid. This is a random I/O 
operation from the primary table since each row id may access a separate data 
block from the table. The Drill executor contains a special operator 
(RowKeyJoinBatch) for doing the join-back. The right side of the RowKeyJoin is 
the index sub-plan. The left side contains a special type of scan called 
Restricted Scan (or skip scan). The storage plugin must implement this scan 
where a list of row ids can be submitted at a time if the backend supports bulk 
fetch. Otherwise, single row ids may be submitted incurring performance 
overhead. Each minor fragment will have an instance of the RowKeyJoin operator 
coupled with a Restricted Scan operator. The DbSubScan interface contains 
methods to specify this coupling.

*Index Intersection:* 

Consider a filter condition 'WHERE a > 10 AND b < 20'. Suppose a single 
composite key index does not exist on both columns but 2 separate indexes exist 
on 'a' and 'b'. Drill planner will create an index intersect plan where row ids 
from each index are retrieved and intersected and only the common row ids are 
used for the join-back to primary table. The index intersection plan is costed 
along with the single index plans and whichever is cheaper is chosen. Note that 
intersectoin is done through a HashJoin operator, so this adds cost (CPU, 
memory, network I/O if the join inputs have to be distributed), so it is 
possible that in some cases the single index plan may turn out cheaper - this 
depends on how much reduction in selectivity happens after intersection.

*Index Metadata:* 

The index metadata is exposed to the Drill planner through the interface 
IndexDescriptor (which is a derived class of IndexDefinition). The interface 
contains methods such as getIndexColumns(), getNonIndexColumns(), 
getCollation() and others which are meant to assist the Drill planner. 
Regarding collation (sortedness), note that range indexes provide collation 
based on the leading prefix of the index columns. However, hash indexes don't 
provide collation. The storage plugin should provide implementations of these 
based on the index metadata exposed by its backend server.

*Statistics and Leading Rowcount:* 

Statistics such as estimated number of rows matching a filter condition and the 
average row size are important parameters for the cost based index selection. 
The Drill planner provides an interface (o.a.d.exec.planner.index.Statistics) 
that contains APIs which should be implemented by the underlying storage plugin 
in order to expose the appropriate statistics (or provide default statistics) 
from the backend. For example, consider the index filter 'WHERE a > 10 AND b < 
20'. The interface method getLeadingRowCount() takes a filter condition as 
argument and should return the 'leading rowcount' for the condition. This is 
the estimated row count of the filter condition based on the leading prefix of 
the index columns. If the index columns are the composite key \{a, b}, then 
leading prefix is \{a, b} for the above filter condition and the row count of 
the full conjunct should be used. However, if the index columns are \{a, c}, 
then only 'a' qualifies as the leading prefix, so the leading row count should 
be estimated for a > 10 only. Similarly, if the index columns are \{b, c} then 
only b < 20 should be used for estimating leading row count. The Drill planner 
will determine the leading prefix based on the filter condition and the index 
metadata.

*Index Selection:* 

The Drill planner in conjunction with Calcite's Volcano planner provides a 
cost-based index selection. In addition, the Drill planner employs a heuristic 
to reduce the search space of planning as described in 'Selectivity and 
thresholds'. For each candidate index it estimates the total cost of the index 
access plus join-back to primary table cost (for non-covering index). Based on 
these, the candidate indexes are ranked based on leading selectivity, collation 
(sortedness) property and whether it is a covering or non-covering index. The 
top 5 indexes per table are chosen (this number is configurable) to be 
considered for plan generation. These may also include index intersection. 
Thus, it is possible that index i1 and i2 may individually not qualify based on 
selectivity but their combined selectivity after intersection could be low 
enough to qualify. Once these indexes are chosen, there are 3 types of plans 
that may be generated: covering index plan, non-covering index plan and index 
intersection plan. The Volcano planner compares the cumulative cost of all of 
these plans along with the original full table scan plan and picks the cheapest 
plan for execution.

*Selectivity and thresholds:* 

For covering index, the Drill planner will generate a covering index plan even 
up to 100 % estimated selectivity. The expectation is that an index-only plan 
is going to be cheaper compared to full table scan due to smaller row widths of 
the index. For non-covering indexes, due to the random I/O nature of the rowkey 
join-back to primary table, the default selectivity threshold is small: 2.5 %. 
This is configurable through the setting 
planner.index.noncovering_selectivity_threshold. If the estimated selectivity 
of the filter condition is above this threshold, the non-covering index plan is 
not generated for that index; the rationale for this is that each new plan adds 
to the search space and increases planning time, so if the estimated row count 
is already high it is unlikely to be chosen and it is better to prune it out 
early. In addition, a global configuration setting: 
planner.enable_index_planning (default is TRUE) enables or disables index 
planning altogether.

> Add capability to do index based planning and execution
> -------------------------------------------------------
>
>                 Key: DRILL-6381
>                 URL: https://issues.apache.org/jira/browse/DRILL-6381
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Relational Operators, Query Planning &amp; 
> Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to