[
https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799858#comment-16799858
]
ASF GitHub Bot commented on DRILL-7117:
---------------------------------------
amansinha100 commented on pull request #1715: DRILL-7117: Support creation of
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715
…a types.
- This PR adds support for creating equi-depth histograms on the following
data types: INT, BIGINT, FLOAT4, FLOAT8, DATE, TIME, TIMESTAMP and BOOLEAN.
No selectivity calculations have been modified yet (that will be done in a
later PR).
- The histogram is built using the t-digest approximation algorithm and
associated data structure.
Please see details in
[DRILL-7117](https://issues.apache.org/jira/browse/DRILL-7117) and the parent
JIRA [DRILL-6992](https://issues.apache.org/jira/browse/DRILL-6992) which
contains a link to the design document.
- The same ANALYZE command used for NDV etc will also gather histograms and
no new syntax has been added. For testing, I have done a bunch of manual
testing using both skewed and uniform distributions and with different data
types. Please see
[DRILL-7117](https://issues.apache.org/jira/browse/DRILL-7117) for results of
such testing. No unit tests have been added yet since the bucket boundaries
change slightly by the underlying t-digest. Making this repeatable and
unit-testable needs some thinking and I will do this in a follow-up PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Support creation of histograms for numeric data types (except Decimal) and
> date/time/timestamp
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-7117
> URL: https://issues.apache.org/jira/browse/DRILL-7117
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Query Planning & Optimization
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
> Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT,
> BIGINT, FLOAT4, FLOAT8 and their corresponding nullable/non-nullable
> versions. Additionally, since DATE/TIME/TIMESTAMP are internally stored as
> longs, we should allow the same numeric type histogram creation for these
> data types as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)