[
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613710#comment-16613710
]
Aman Sinha commented on DRILL-6381:
-----------------------------------
The index planning and execution feature has been in development in a private
repository at MapR. We would like to contribute it to Apache Drill and hope
that it spurs further development and adoption by the community. The main ideas
are described below.
The feature is divided into two broad categories:
# A comprehensive index planning and execution framework which supports
distributed covering and non-covering indices. Index planning is done for WHERE
clause filters, sort-based operation such as ORDER BY, GROUP BY (using
StreamingAggregate) and joins (using MergeJoin). The framework is intended to
be agnostic to the storage plugins. It provides a clean abstraction layer that
allows the Drill planner and executor to work with only core Drill artifacts
while storage plugins provide concrete implementations of the interfaces.
# A reference implementation with MapR-DB JSON plugin whose backend supports
secondary indexing. Other similar DB plugins whose backend supports secondary
indices could potentially use the reference implementation as a guide.
Note that Drill is a query engine; it does not provide CRUD operations either
on the primary table or indexes. These are assumed to be maintained by the
respective backend servers to which Drill communicates via storage/format
plugins.
+*Key design concepts:*+
*Covering index:*
An index whose index fields plus non-index fields (a.k.a included fields)
'covers' all the columns referenced in the query. The Drill planner will
generate a covering index plan (a.k.a index-only) plan where all the columns
are retrieved from the index after pushing down relevant filter conditions to
the index scan.
*Non-covering index:*
An index whose index fields plus included fields only partially covers columns
referenced in the query. For instance, suppose the index is created as follows:
index keys:\{a, b}, included fields: \{c} and the query is SELECT d, e FROM T
WHERE a > 10 AND b < 20. In this case, since columns d, e are not present in
the index at all, this is a non-covering index. For such indexes, the Drill
planner will generate a non-covering plan where only the row ids are fetched
from the index by pushing down the WHERE clause filters and the rest of the
columns are fetched after a join-back to the primary table. The join-back is
performed using the row ids. A related notion is that of Global index: Drill
planner assumes indexes are global in nature, i.e the index blocks are not
necessarily co-located with the primary table's data blocks. This is the most
general case since an index may be quite large and in order to fully utilize
the cluster resources it is best to have it fully distributed.
*Functional index:*
An index which is created not on the base table columns but on
functions/expressions. Currently, only CAST functions have been tested since
these are most commonly used in Drill views. If the filter condition is 'WHERE
CAST(zip_code as BIGINT) = 95120 and a functional index exists on CAST(zip_code
as BIGINT), then the Drill planner will leverage such indexes as long as they
are exposed through appropriate metadata interfaces.
*Range partitioning:*
This is applicable for non-covering indexes. Since the index is typically
distributed across multiple nodes in the cluster, once we retrieve the row ids
from a particular node, we have to send each row id to the appropriate
destination node which contains primary table data for that row. This is done
through range partitioning. The Drill executor contains a special operator
(RangePartitionRecordBatch) and a corresponding exchange
(RangePartitionExchange) for this purpose. For example, suppose the filter
condition is WHERE state IN ('CA', 'TX', NY'). Since the primary table data for
these states may be spread out over multiple nodes, once the row ids are
fetched from the index, they have to be grouped into separate 'ranges' such
that co-located row ids can be sent to the same node.
*Rowkey Join and Restricted Scan (skip scan):*
A Rowkey (a.k.a rowid) join is used whenever we need to fetch the rows from
primary table for a set of row ids retrieved from the index. Note that this is
not a real join but more of a lookup based on rowid. This is a random I/O
operation from the primary table since each row id may access a separate data
block from the table. The Drill executor contains a special operator
(RowKeyJoinBatch) for doing the join-back. The right side of the RowKeyJoin is
the index sub-plan. The left side contains a special type of scan called
Restricted Scan (or skip scan). The storage plugin must implement this scan
where a list of row ids can be submitted at a time if the backend supports bulk
fetch. Otherwise, single row ids may be submitted incurring performance
overhead. Each minor fragment will have an instance of the RowKeyJoin operator
coupled with a Restricted Scan operator. The DbSubScan interface contains
methods to specify this coupling.
*Index Intersection:*
Consider a filter condition 'WHERE a > 10 AND b < 20'. Suppose a single
composite key index does not exist on both columns but 2 separate indexes exist
on 'a' and 'b'. Drill planner will create an index intersect plan where row ids
from each index are retrieved and intersected and only the common row ids are
used for the join-back to primary table. The index intersection plan is costed
along with the single index plans and whichever is cheaper is chosen. Note that
intersectoin is done through a HashJoin operator, so this adds cost (CPU,
memory, network I/O if the join inputs have to be distributed), so it is
possible that in some cases the single index plan may turn out cheaper - this
depends on how much reduction in selectivity happens after intersection.
*Index Metadata:*
The index metadata is exposed to the Drill planner through the interface
IndexDescriptor (which is a derived class of IndexDefinition). The interface
contains methods such as getIndexColumns(), getNonIndexColumns(),
getCollation() and others which are meant to assist the Drill planner.
Regarding collation (sortedness), note that range indexes provide collation
based on the leading prefix of the index columns. However, hash indexes don't
provide collation. The storage plugin should provide implementations of these
based on the index metadata exposed by its backend server.
*Statistics and Leading Rowcount:*
Statistics such as estimated number of rows matching a filter condition and the
average row size are important parameters for the cost based index selection.
The Drill planner provides an interface (o.a.d.exec.planner.index.Statistics)
that contains APIs which should be implemented by the underlying storage plugin
in order to expose the appropriate statistics (or provide default statistics)
from the backend. For example, consider the index filter 'WHERE a > 10 AND b <
20'. The interface method getLeadingRowCount() takes a filter condition as
argument and should return the 'leading rowcount' for the condition. This is
the estimated row count of the filter condition based on the leading prefix of
the index columns. If the index columns are the composite key \{a, b}, then
leading prefix is \{a, b} for the above filter condition and the row count of
the full conjunct should be used. However, if the index columns are \{a, c},
then only 'a' qualifies as the leading prefix, so the leading row count should
be estimated for a > 10 only. Similarly, if the index columns are \{b, c} then
only b < 20 should be used for estimating leading row count. The Drill planner
will determine the leading prefix based on the filter condition and the index
metadata.
*Index Selection:*
The Drill planner in conjunction with Calcite's Volcano planner provides a
cost-based index selection. In addition, the Drill planner employs a heuristic
to reduce the search space of planning as described in 'Selectivity and
thresholds'. For each candidate index it estimates the total cost of the index
access plus join-back to primary table cost (for non-covering index). Based on
these, the candidate indexes are ranked based on leading selectivity, collation
(sortedness) property and whether it is a covering or non-covering index. The
top 5 indexes per table are chosen (this number is configurable) to be
considered for plan generation. These may also include index intersection.
Thus, it is possible that index i1 and i2 may individually not qualify based on
selectivity but their combined selectivity after intersection could be low
enough to qualify. Once these indexes are chosen, there are 3 types of plans
that may be generated: covering index plan, non-covering index plan and index
intersection plan. The Volcano planner compares the cumulative cost of all of
these plans along with the original full table scan plan and picks the cheapest
plan for execution.
*Selectivity and thresholds:*
For covering index, the Drill planner will generate a covering index plan even
up to 100 % estimated selectivity. The expectation is that an index-only plan
is going to be cheaper compared to full table scan due to smaller row widths of
the index. For non-covering indexes, due to the random I/O nature of the rowkey
join-back to primary table, the default selectivity threshold is small: 2.5 %.
This is configurable through the setting
planner.index.noncovering_selectivity_threshold. If the estimated selectivity
of the filter condition is above this threshold, the non-covering index plan is
not generated for that index; the rationale for this is that each new plan adds
to the search space and increases planning time, so if the estimated row count
is already high it is unlikely to be chosen and it is better to prune it out
early. In addition, a global configuration setting:
planner.enable_index_planning (default is TRUE) enables or disables index
planning altogether.
> Add capability to do index based planning and execution
> -------------------------------------------------------
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
> Issue Type: New Feature
> Components: Execution - Relational Operators, Query Planning &
> Optimization
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary
> indexes), Drill should leverage those during planning and execution in order
> to improve query performance.
> On the planning side, Drill planner should be enhanced to provide an
> abstraction layer which express the index metadata and statistics. Further,
> a cost-based index selection is needed to decide which index(es) are
> suitable.
> On the execution side, appropriate operator enhancements would be needed to
> handle different categories of indexes such as covering, non-covering
> indexes, taking into consideration the index data may not be co-located with
> the primary table, i.e a global index.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)