[jira] [Commented] (HIVE-8456) Support Hive Counter to collect spark job metric[Spark Branch]

Chengxiang Li (JIRA) Wed, 15 Oct 2014 22:29:45 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173390#comment-14173390
 ]


Chengxiang Li commented on HIVE-8456:
-------------------------------------

{quote}
1. Shall we think of better names for the new classes? Because the naming (e.g. 
SparkCounterGroup and SparkCounters) seems a little bit confusing to me.
{quote}
The classes names are inherit from MR/Tez counterpart to keep it consistent. If 
the class names are confusing, we may open a ticket to modify all MR/Tez/Spark 
counters together in the future.
{quote}
2. Have we defined all the counters in SparkCounters.initializeSparkCounters? 
For example, it seems Operator.HIVECOUNTERFATAL isn't added there.
{quote}
Not yet, it's not easy to find all necessary counters and register them in this 
time, I plan to register specified counters while enable features which depends 
on spark counter.
{quote}
3. The Counter enum in operators doesn't seem to be used as "Counter" in hive. 
Rather, it's just kept in statsMap : HashMap<Enum<?>, LongWritable>. Maybe we 
shouldn't add them as SparkCounter? If we do want to wrap them as SparkCounter, 
there're other operators to handle other than MapOperator, e.g. FilterOperator 
and JoinOperator also have such an enum.
{quote}
statsMap is used to gather table statistic information here i suppose, as Hive 
use Counter as an option to store table statistic information. Mainly Hive 
could register SparkCounter with Enum class name as group name and Enum name as 
counter name, and this's why SparkCounters API support create/get/increment 
counters with Enum parameter.
{quote}
4. Maybe we should always use HiveConf.ConfVars.HIVECOUNTERGROUP as the group 
name, rather than the enum class name (key.getDeclaringClass().getName())?
{quote}
The group/counter name are all inherit from MR/Tez counterpart, counters are 
folded into different group, i think we should consist the fold if it make 
sense.

> Support Hive Counter to collect spark job metric[Spark Branch]
> --------------------------------------------------------------
>
>                 Key: HIVE-8456
>                 URL: https://issues.apache.org/jira/browse/HIVE-8456
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>              Labels: Spark-M3
>         Attachments: HIVE-8456.1-spark.patch, HIVE-8456.2-spark.patch
>
>
> Several Hive query metric in Hive operators is collected by Hive Counter, 
> such as CREATEDFILES and DESERIALIZE_ERRORS, Besides, Hive use Counter as an 
> option to collect table stats info.  Spark support Accumulator which is 
> pretty similiar with Hive Counter, we could try to enable Hive Counter based 
> on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8456) Support Hive Counter to collect spark job metric[Spark Branch]

Reply via email to