[ 
https://issues.apache.org/jira/browse/HIVE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9136:
-----------------------
    Status: Patch Available  (was: Open)

Patch v1. I added several spark-specific log events to {{PerfLogger}}. The 
correspondence agains Tez is:

|| In Tez || In Spark ||
| TEZ_SUBMIT_TO_RUNNING | SPARK_SUBMIT_TO_RUNNING |
| TEZ_BUILD_DAG | SPARK_BUILD_PLAN + SPARK_BUILD_RDD_GRAPH|
| TEZ_SUBMIT_DAG | SPARK_SUBMIT_JOB |
| TEZ_RUN_DAG | SPARK_RUN_JOB |
| TEZ_CREATE_VERTEX | SPARK_CREATE_TRAN |
| TEZ_RUN_VERTEX | SPARK_RUN_STAGE |
| TEZ_INIITIALIZE_PROCESSOR | ? |
| TEZ_RUN_PROCESSOR | ? |
| TEZ_INITIALIZE_OPERATORS | SPARK_INITIALIZE_OPERATORS |

For TEZ_INITIALIZE_PROCESSOR and TEZ_RUN_PROCESSOR, I didn't find 
correspondence in our Spark branch. Any idea? Maybe log the 
{{SparkBaseFunctionResultList}}?

In addition, I added SPARK_FLUSH_HASHTABLE, to track perf on Spark hash table 
sink, and SPARK_GENERATE_OPERATOR_TREE, to track perf on, as the name suggests, 
generating operator tree.

I'm also open to any kind of suggestions.



> Profile query compiler [Spark Branch]
> -------------------------------------
>
>                 Key: HIVE-9136
>                 URL: https://issues.apache.org/jira/browse/HIVE-9136
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Brock Noland
>            Assignee: Chao
>         Attachments: HIVE-9136.1.patch
>
>
> We should put some performance counters around the compiler and evaluate how 
> long it takes to compile a query in Spark versus the other execution 
> frameworks. Query 28 is a good one to use for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to