> On Nov. 9, 2017, 7:51 p.m., Ashutosh Chauhan wrote: > > ql/src/test/results/clientpositive/llap/auto_sortmerge_join_12.q.out > > Line 160 (original), 160 (patched) > > <https://reviews.apache.org/r/63442/diff/2/?file=1886244#file1886244line160> > > > > bucket_small has no stats gathered. This should be NONE. > > Zoltan Haindrich wrote: > `hive.stats.autogather` is enabled by default from `HiveConf` > > Ashutosh Chauhan wrote: > Those are load statements, not inserts. We don't gather stats with load > statements only with insets. > > Zoltan Haindrich wrote: > sorry, you are right: basic stats are not gathered in this case in any > way. > > But the stat state is complete; because: there is logic which scans the > file sizes - to calculate the datasizes; and from there HIVE-16811 can guess > some row counts > > > https://github.com/kgyrtkirk/hive/blob/9f67a878512117eb5c251794adc1a91bae62fea7/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L386-L393 > > Firts I would like to make the standalone table/partitioned table's > calculation-s are a bit more similar to eachother > > I've tried to come up with some definitions for NONE/PARTIAL/COMPLETE; > currently I would say the following: > > * NONE: not known > * on table: no information (afaik currently this can't happen) > * estimation tree: all nodes in the estimation tree were NONE > * PARTIAL: > * on table: the current information is estimated from data size > * estimation tree: contains at least one NONE/PARTIAL > * COMPLETE: > * current information is correct (calculated by statstask-s) > * estimation tree: the whole subtree has COMPLETE status > > If I use these definitions; then I would say that the filesystem size > based estimation should be considered PARTIAL.
Definitions sounds good. Lets use them to make sure our state calculation logic is built on it. Can you also add this in code comments. - Ashutosh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63442/#review190633 ----------------------------------------------------------- On Nov. 9, 2017, 5:39 p.m., Zoltan Haindrich wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63442/ > ----------------------------------------------------------- > > (Updated Nov. 9, 2017, 5:39 p.m.) > > > Review request for hive and Ashutosh Chauhan. > > > Bugs: HIVE-17934 > https://issues.apache.org/jira/browse/HIVE-17934 > > > Repository: hive-git > > > Description > ------- > > * remove the reactive stat state guessing method > * make the guessing only work when a new object is created > * change the way stat objects are merged > > this patch will most probably break almost all qtest outputs.... > > > Diffs > ----- > > accumulo-handler/src/test/results/positive/accumulo_queries.q.out > b3adf4e504 > hbase-handler/src/test/results/positive/hbase_queries.q.out b2eda12e95 > hbase-handler/src/test/results/positive/hbasestats.q.out 29eefd43a9 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java > 7a3fae65e8 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java > a4f60accce > ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java 8ffb4ce44b > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ce7c96c639 > ql/src/test/queries/clientpositive/lateral_view_onview2.q PRE-CREATION > ql/src/test/queries/clientpositive/stats_empty_partition2.q PRE-CREATION > ql/src/test/results/clientpositive/acid_table_stats.q.out 351ff0da0a > ql/src/test/results/clientpositive/alterColumnStatsPart.q.out 858e16fe22 > ql/src/test/results/clientpositive/annotate_stats_part.q.out 3a94a6a4e3 > ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 7875e9693a > ql/src/test/results/clientpositive/cbo_const.q.out e9f885b363 > ql/src/test/results/clientpositive/cbo_input26.q.out 77fc194829 > ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 414b715b7a > ql/src/test/results/clientpositive/columnstats_quoting.q.out 683c1e274f > ql/src/test/results/clientpositive/columnstats_tbllvl.q.out a2c6ead293 > ql/src/test/results/clientpositive/constGby.q.out c633624935 > ql/src/test/results/clientpositive/constant_prop_3.q.out cba4744866 > ql/src/test/results/clientpositive/constprog3.q.out f54168d0ee > ql/src/test/results/clientpositive/correlationoptimizer10.q.out a03acd38a7 > ql/src/test/results/clientpositive/correlationoptimizer11.q.out cf2250790a > ql/src/test/results/clientpositive/correlationoptimizer13.q.out 6d4f931213 > ql/src/test/results/clientpositive/correlationoptimizer14.q.out 149f33fee8 > ql/src/test/results/clientpositive/correlationoptimizer15.q.out 2d813b239f > ql/src/test/results/clientpositive/correlationoptimizer5.q.out 68d6a54862 > ql/src/test/results/clientpositive/correlationoptimizer7.q.out 82fecab594 > ql/src/test/results/clientpositive/correlationoptimizer8.q.out f3cb988a03 > ql/src/test/results/clientpositive/correlationoptimizer9.q.out 5372408d2a > ql/src/test/results/clientpositive/cte_mat_5.q.out 3747cec891 > ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 8e2e77b077 > ql/src/test/results/clientpositive/druid_basic2.q.out 753ccb456f > ql/src/test/results/clientpositive/empty_join.q.out a4a9976a7f > ql/src/test/results/clientpositive/filter_cond_pushdown_HIVE_15647.q.out > 779bea3a26 > ql/src/test/results/clientpositive/groupby_sort_6.q.out a66ec97642 > ql/src/test/results/clientpositive/having2.q.out 80301bfc04 > ql/src/test/results/clientpositive/input23.q.out 80ee81b654 > ql/src/test/results/clientpositive/input26.q.out 1ac082eedf > ql/src/test/results/clientpositive/join_cond_pushdown_unqual1.q.out > 74f45e58c0 > ql/src/test/results/clientpositive/join_cond_pushdown_unqual2.q.out > 2ac67b294c > ql/src/test/results/clientpositive/join_cond_pushdown_unqual3.q.out > b8d9b408d7 > ql/src/test/results/clientpositive/join_cond_pushdown_unqual4.q.out > e5ddc3507f > ql/src/test/results/clientpositive/join_view.q.out 1d83742dd4 > ql/src/test/results/clientpositive/lateral_view_onview.q.out 423885e442 > ql/src/test/results/clientpositive/lateral_view_onview2.q.out PRE-CREATION > ql/src/test/results/clientpositive/list_bucket_query_oneskew_2.q.out > 876434fb4e > ql/src/test/results/clientpositive/llap/auto_sortmerge_join_12.q.out > 3acbb207a7 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out > 67fe41e223 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw.q.out > 1c672ef068 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out > a51637a2b9 > > ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out > 02cadb7cff > ql/src/test/results/clientpositive/llap/llap_nullscan.q.out 2a891234e5 > ql/src/test/results/clientpositive/llap/mapjoin_hint.q.out 505524e78c > ql/src/test/results/clientpositive/llap/mapreduce1.q.out 0e94e71d27 > ql/src/test/results/clientpositive/llap/mapreduce2.q.out 6485f587f8 > ql/src/test/results/clientpositive/llap/metadataonly1.q.out e6853b23e3 > ql/src/test/results/clientpositive/llap/reduce_deduplicate.q.out 65b74ee319 > ql/src/test/results/clientpositive/llap/subquery_in.q.out c7b98d3967 > ql/src/test/results/clientpositive/llap/subquery_multi.q.out d1579033ac > ql/src/test/results/clientpositive/llap/subquery_null_agg.q.out 78ee174935 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 06a929dd0a > ql/src/test/results/clientpositive/llap/subquery_select.q.out 514a7889b3 > ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out 7a4db158c8 > ql/src/test/results/clientpositive/llap/vector_windowing_gby2.q.out > ce1881b7fb > ql/src/test/results/clientpositive/llap/vector_windowing_streaming.q.out > 61730f59ee > ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out > 3e246bcbe6 > ql/src/test/results/clientpositive/materialized_view_rewrite_ssb.q.out > de491989a5 > ql/src/test/results/clientpositive/materialized_view_rewrite_ssb_2.q.out > a11d66815a > ql/src/test/results/clientpositive/nullgroup3.q.out fe23f39fd8 > ql/src/test/results/clientpositive/nullgroup5.q.out 783f6d76b6 > ql/src/test/results/clientpositive/partial_column_stats.q.out 44db81a443 > ql/src/test/results/clientpositive/perf/spark/query66.q.out 1dc0fac408 > ql/src/test/results/clientpositive/perf/spark/query99.q.out c0c5f136ec > ql/src/test/results/clientpositive/position_alias_test_1.q.out ee81a79a0b > ql/src/test/results/clientpositive/ppd_outer_join5.q.out 84c10828ce > ql/src/test/results/clientpositive/ppd_repeated_alias.q.out c94002f37d > ql/src/test/results/clientpositive/row__id.q.out 9aab097f21 > ql/src/test/results/clientpositive/semijoin4.q.out 53f6c174bd > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out > 09caf944d2 > ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual1.q.out > dc9b61e39a > ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual2.q.out > 82634fba44 > ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual3.q.out > d1b20006b0 > ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual4.q.out > 2bfc81d275 > ql/src/test/results/clientpositive/spark/join_view.q.out 61867f75f3 > ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out d294f4910c > ql/src/test/results/clientpositive/spark/ppd_outer_join5.q.out e49260aa35 > ql/src/test/results/clientpositive/spark/semijoin.q.out d2dac10f3f > ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out e2f68a02bc > > ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out > d7b445baf8 > > ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out > 1a8e9ffcc5 > ql/src/test/results/clientpositive/spark/subquery_in.q.out fd25e36fba > ql/src/test/results/clientpositive/spark/subquery_multi.q.out b91c33ee4a > ql/src/test/results/clientpositive/spark/subquery_null_agg.q.out 945e2a7102 > ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 8f3ac0d636 > ql/src/test/results/clientpositive/spark/subquery_select.q.out edb2b92f73 > ql/src/test/results/clientpositive/spark/union_remove_25.q.out f681428785 > ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out > 78740fec6f > ql/src/test/results/clientpositive/stats_empty_partition2.q.out > PRE-CREATION > ql/src/test/results/clientpositive/subquery_exists_having.q.out ef06dfe697 > ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out > 79b7d83619 > ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out > a202e45be9 > ql/src/test/results/clientpositive/union_remove_25.q.out 20ab809cb1 > ql/src/test/results/clientpositive/union_view.q.out 35f8a9a226 > > > Diff: https://reviews.apache.org/r/63442/diff/2/ > > > Testing > ------- > > > Thanks, > > Zoltan Haindrich > >