[GitHub] [hive] asolimando commented on pull request #3137: HIVE-26221: Add histogram-based column statistics

GitBox Thu, 08 Dec 2022 06:18:33 -0800


asolimando commented on PR #3137:
URL: https://github.com/apache/hive/pull/3137#issuecomment-1342807270


   > Another question I don't see here is how we generate the histogram 
statistics? by issuing an "analyze table" command?
   
   That was hard to figure out for me too at first. Statistics computation 
happens via an aggregate query, where different `UDAF`s are used to compute the 
different statistics.
   
   
[ColumnStatsSemanticAnalyzer.java#L308-L325](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L308-L325)
 generates the `SELECT` statement for the stats.
   
   It's then calling 
[ColumnStatsSemanticAnalyzer.java#L327](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L327)
 which has an enum with the different statistics, what we did was to add a new 
one for histograms and generated the code accordingly (see 
[ColumnStatsSemanticAnalyzer.java#L355-L357](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L355-L357)).
   
   Finally, the UDAF part is generated here: 
[ColumnStatsSemanticAnalyzer.java#L494-L519](https://github.com/apache/hive/blob/1e9e51dbb5ab5acd4d5a05eff31752a5997beb03/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L494-L519).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] asolimando commented on pull request #3137: HIVE-26221: Add histogram-based column statistics

Reply via email to