amansinha100 opened a new pull request #1715: DRILL-7117: Support creation of 
equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715
 
 
   …a types.
   
   - This PR adds support for creating equi-depth histograms on the following 
data types: INT, BIGINT, FLOAT4, FLOAT8, DATE, TIME, TIMESTAMP and BOOLEAN.   
No selectivity calculations have been modified yet (that will be done in a 
later PR).  
   
   - The histogram is built using the t-digest approximation algorithm and 
associated data structure.  
   Please see details in 
[DRILL-7117](https://issues.apache.org/jira/browse/DRILL-7117) and the parent 
JIRA [DRILL-6992](https://issues.apache.org/jira/browse/DRILL-6992) which 
contains a link to the design document. 
   
   - The same ANALYZE command used for NDV etc will also gather histograms and 
no new syntax has been added.  For testing, I have done a bunch of manual 
testing using both skewed and uniform distributions and with different data 
types.  Please see 
[DRILL-7117](https://issues.apache.org/jira/browse/DRILL-7117) for results of 
such testing.  No unit tests have been added yet since the bucket boundaries 
change slightly by the underlying t-digest.  Making this repeatable and 
unit-testable needs some thinking and I will do this in a follow-up PR. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to