[ 
https://issues.apache.org/jira/browse/HIVE-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9689:
----------------------------------------
    Labels: gsoc gsoc2015 hive java  (was: gsoc2015)

> Store distinct value estimator's bit vectors in metastore
> ---------------------------------------------------------
>
>                 Key: HIVE-9689
>                 URL: https://issues.apache.org/jira/browse/HIVE-9689
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Prasanth Jayachandran
>              Labels: gsoc, gsoc2015, hive, java
>
> Hive currently uses PCSA (Probabilistic Counting and Stochastic Averaging) 
> algorithm to determine distinct cardinality. The NDV value determined from 
> the UDF is stored in the metastore instead of the actual bit vectors. This 
> makes it impossible to estimate the overall NDV across all the partitions (or 
> selected partitions). We should ideally store the bitvectors in the metastore 
> and do server side merging of the bitvectors. Also we could replace the 
> current PCSA algorithm in favour of HyperLogLog if space is a constraint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to