GitHub user gparai opened a pull request:
https://github.com/apache/drill/pull/606
Drill-1328: Compute and use statistics in Drill
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gparai/drill Drill-1328-r2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/606.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #606
----
commit fcde922184f87f23424c38f2d921a3261e97555a
Author: Gautam Parai <[email protected]>
Date: 2016-10-06T19:26:10Z
Merge pull request #1 from apache/master
Sync with apache master
commit 7877c934a680e3afb7f6ee1299ff0a47f5091c9d
Author: Cliff Buchanan <[email protected]>
Date: 2014-08-21T21:59:53Z
DRILL-1328: Support table statistics
PRE: Add "append" concept to directory write.
* This is so stats can be stored in [table].stats.drill and be appended to
be writing a new file into the directory.
FUNCS: Statistics functions as UDFs:
Currently using FieldReader to ensure consistent output type so that
Unpivot doesn't get confused. All stats columns should be Nullable, so that
stats functions can return NULL when N/A.
* custom versions of "count" that always return BigInt
* HyperLogLog based NDV that returns BigInt that works only on VarChars
* HyperLogLog with binary output that only works on VarChars
OPS: Updated protobufs for new ops
OPS: Implemented StatisticsAggregate
OPS: Implemented StatisticsUnpivot
ANALYZE: AnalyzeTable functionality
* JavaCC syntax more-or-less copied from LucidDB.
* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel and StatsAggPrel
ANALYZE: Add getMetadataTable() to AbstractSchema
USAGE: Change field access in QueryWrapper
USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel
* since ScanPrel does not inherit from DrillScanRelBase, this requires
adding a DrillTable to the constructor
* This is done so that a custom ReflectiveRelMetadataProvider can access
the DrillTable associated with Logical/Physical scans.
USAGE: Attach DrillTableMetadata to DrillTable.
* DrillTableMetadata represents the data scanned from a corresponding
".stats.drill" table
* In order to avoid doing query execution right after the ".stats.drill"
table is found, metadata is not actually collected until the
MaterializationVisitor is used.
** Currently, the metadata source must be a string (so that a SQL query can
be created). Doing this with a table is probably more complicated.
** Query is set up to extract only the most recent statistics results for
each column.
USAGE: Configure DrillJoinRelBase to use NDV metadata when available.
USAGE: attach metadata to table
USAGE: implement optiq provider
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---