Thomas Rebele created HIVE-29654:
------------------------------------

             Summary: Provide an easy way to execute a query and get the CBO 
plan at the same time
                 Key: HIVE-29654
                 URL: https://issues.apache.org/jira/browse/HIVE-29654
             Project: Hive
          Issue Type: New Feature
            Reporter: Thomas Rebele


It would be nice if Hive provided an easy way to execute a query and get the 
CBO and stage plans of that execution. Currently this is easy to achieve for 
the stage plans, but not for the CBO plan (see background section below).

Some approaches:
 * output CBO plans if {{hive.log.explain.output}} is enabled
 * add another config option (e.g., {{{}hive.log.cbo.plan{}}}) to output the 
CBO plans when the query is executed
 * add a new command EXPLAIN EXECUTE that prints the query plans (CBO and stage 
plans) and then executes these plans

----
Background: 

I've looked into the results of a TPC-DS benchmark. I wanted to understand why 
the performance of a query has changed. I had the execution time of the query, 
some logs, and some EXPLAIN plans. Unfortunately the EXPLAIN plans were not 
helpful, as the plans were not the same (different operator ids).

I did an experiment with TPC-DS metadata and KLL histograms enabled. I executed 
EXPLAIN CBO JOINCOST several times in the same session. For 8 runs, I got 5 
different plans.

I had a chat with an AI tool, and it suggested some options to get the plan. 
After a while I got it working:
 * stage plans: {{SET hive.server2.logging.operation.level = VERBOSE; SET 
hive.log.explain.output=true;}}
 * CBO plan: enable DEBUG logging for 
{{org.apache.hadoop.hive.ql.parse.CalcitePlanner}} in 
{{{}hive-log4j2.properties{}}}. Also executing {{SET 
hive.server2.logging.operation.level = VERBOSE;}} helps, as the plans will then 
show up in beeline.

It would be nice to make this a bit simpler. It might be the case that the user 
cannot easily change the log4j2 properties. Also, setting the DEBUG level 
globally just to get the plan for one query looks like a lot of overhead, if 
other users are running queries at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to