[ 
https://issues.apache.org/jira/browse/PHOENIX-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469198#comment-16469198
 ] 

Maryann Xue commented on PHOENIX-4724:
--------------------------------------

Yes, I agree with [~jamestaylor] that this information can be useful for the 
query optimizer. Right now for WHERE clause conditions, other than those 
filters on the primary key, we can only have a very rough "guess" of the number 
of rows/bytes of the filtered output. This information can definitely give a 
more accurate estimation for the filter conditions on columns covered by the 
histogram. For example, for a range or equal condition on such columns, we can 
estimate the filtered rows/bytes by calculating (number of buckets that fall in 
the range / number of total buckets).

> Efficient Equi-Depth histogram for streaming data
> -------------------------------------------------
>
>                 Key: PHOENIX-4724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4724
>             Project: Phoenix
>          Issue Type: Sub-task
>    Affects Versions: 4.15.0
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Major
>         Attachments: PHOENIX-4724.v1.patch, PHOENIX-4724.v2.patch
>
>
> Equi-Depth histogram from 
> http://web.cs.ucla.edu/~zaniolo/papers/Histogram-EDBT2011-CamReady.pdf, but 
> without the sliding window - we assume a single window over the entire data 
> set.
> Used to generate the bucket boundaries of a histogram where each bucket has 
> the same # of items.
> This is useful, for example, for pre-splitting an index table, by feeding in 
> data from the indexed column.
> Works on streaming data - the histogram is dynamically updated for each new 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to