asolimando commented on PR #3137: URL: https://github.com/apache/hive/pull/3137#issuecomment-1342807270
> Another question I don't see here is how we generate the histogram statistics? by issuing an "analyze table" command? That was hard to figure out for me too at first. Statistics computation happens via an aggregate query, where different `UDAF`s are used to compute the different statistics. [ColumnStatsSemanticAnalyzer.java#L308-L325](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L308-L325) generates the `SELECT` statement for the stats. It's then calling [ColumnStatsSemanticAnalyzer.java#L327](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L327) which has an enum with the different statistics, what we did was to add a new one for histograms and generated the code accordingly (see [ColumnStatsSemanticAnalyzer.java#L355-L357](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L355-L357)). Finally, the UDAF part is generated here: [ColumnStatsSemanticAnalyzer.java#L494-L519](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L494-L519). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
