[ https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao updated HIVE-9136: ----------------------- Status: Patch Available (was: Open) Patch v1. I added several spark-specific log events to {{PerfLogger}}. The correspondence agains Tez is: || In Tez || In Spark || | TEZ_SUBMIT_TO_RUNNING | SPARK_SUBMIT_TO_RUNNING | | TEZ_BUILD_DAG | SPARK_BUILD_PLAN + SPARK_BUILD_RDD_GRAPH| | TEZ_SUBMIT_DAG | SPARK_SUBMIT_JOB | | TEZ_RUN_DAG | SPARK_RUN_JOB | | TEZ_CREATE_VERTEX | SPARK_CREATE_TRAN | | TEZ_RUN_VERTEX | SPARK_RUN_STAGE | | TEZ_INIITIALIZE_PROCESSOR | ? | | TEZ_RUN_PROCESSOR | ? | | TEZ_INITIALIZE_OPERATORS | SPARK_INITIALIZE_OPERATORS | For TEZ_INITIALIZE_PROCESSOR and TEZ_RUN_PROCESSOR, I didn't find correspondence in our Spark branch. Any idea? Maybe log the {{SparkBaseFunctionResultList}}? In addition, I added SPARK_FLUSH_HASHTABLE, to track perf on Spark hash table sink, and SPARK_GENERATE_OPERATOR_TREE, to track perf on, as the name suggests, generating operator tree. I'm also open to any kind of suggestions. > Profile query compiler [Spark Branch] > ------------------------------------- > > Key: HIVE-9136 > URL: https://issues.apache.org/jira/browse/HIVE-9136 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: spark-branch > Reporter: Brock Noland > Assignee: Chao > Attachments: HIVE-9136.1.patch > > > We should put some performance counters around the compiler and evaluate how > long it takes to compile a query in Spark versus the other execution > frameworks. Query 28 is a good one to use for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)