> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> > I am not sure if we need this config.
> > Any reason to support storing FM sketch's bitvectors?
> >
> > If this is for 3.0.0 release only, we could even remove this config.
> >
> > If we need to support FM sketch based NDV then having a separate field
> > in metastore will be better as Ashutosh suggested.
>
> pengcheng xiong wrote:
> This config is just to give user an option to choose whether to use FM or
> HLL to COMPUTE ndv. This patch does not change the fact that we do not store
> bit vectors for both FM and NDV in object store.
>
> Prasanth_J wrote:
> I don't think it will be useful for user to tune this.
> IMHO, most users won't care a lot about this. This can be used as
> fallback but this will only add more checks.
>
> Ashutosh Chauhan wrote:
> I agree with Prasanth. Overloading this config seems hacky. It is leading
> to confusing code, e.g, NDVEstimator uses number of bit vectors to determine
> ndv algo, thats confusing. Its confusing for end user too that +ve value from
> [0,100] is useful for one algo, but -ve value is only to switch over to diff
> algo. I suggest that we leave this config as is and introduce a new config
> which dictates which algo is used for ndv computation. This config value is
> than passed to compute_stats() udaf by ColumnStatsSemanticAnalyzer.
> GenericUDAFComputeStats than uses this config to determine which algo its
> using.
> This config we can store in metastore (either as is or mapped as int) so
> that we can deserialize the bit vectors correctly when we retrieve them.
>
> pengcheng xiong wrote:
> I agree it is a bit confusing but then we need to store two extra
> information in a metastore: (1) the algorithm that we use, FMSketch or HLL
> and (2) the number of bitvectors for FMSketch.
We can just store algo. For # of bit vectors for FMSketch we can use existing
logic of counting '{' in serialized string for bit vectors.
- Ashutosh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------
On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
>
> (Updated July 10, 2017, 9:29 p.m.)
>
>
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> HIVE-16966
>
>
> Diffs
> -----
>
> common/pom.xml e6722babd8
> common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java
> 7c9d72fbd2
>
> common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java
> PRE-CREATION
>
> common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
> PRE-CREATION
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java
> PRE-CREATION
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java
> PRE-CREATION
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java
> PRE-CREATION
>
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
> PRE-CREATION
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java
> PRE-CREATION
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java
> PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd
>
> metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
> 92f9a845e3
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
> 36b2c9c56b
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
> a88ef84e5c
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
> 8ac6561aec
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
> 2aa4046a46
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java
> 33c7e3e52c
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java
> fe890e4e27
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java
> 3179b23438
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java
> c13add9d9c
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java
> fbdba24b0a
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java
> ac65590505
>
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java
> 41587477d3
> pom.xml f9fae59a5d
>
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
> a05906edfa
>
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java
> e76fc74dbc
>
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
> 2ebfcb2360
>
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java
> 1c197a028a
>
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
> fa70f49857
>
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java
> 601901c163
> ql/src/test/queries/clientpositive/hll.q PRE-CREATION
> ql/src/test/results/clientpositive/hll.q.out PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/60753/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> pengcheng xiong
>
>