[
https://issues.apache.org/jira/browse/HIVE-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701677#action_12701677
]
Zheng Shao commented on HIVE-440:
---------------------------------
JoinOperator.java
{code}
+ private int getNextSize(int sz) {
+ // A very simple counter to keep track of join entres for a key
+ if ((sz % 128000) == 0)
+ return sz + 128000;
+
+ return 2 * sz;
+ }
+
{code}
EmitInterval is configurable, so there is no way to make sure "(sz % 128000) ==
0" will happen. It's better to say "sz >= 128000" I think.
We might want to change the same logic in ExecReducer for precaution.
For ExecReducer, I would prefer to start the counting from 1 instead of 1000.
It's 3 additional lines of log, but will give us some insights whether the
operator is blocking on the first row, or map-reduce framework never send the
first row to our Operators.
> no way to know number of rows for a reducer and join keys
> ---------------------------------------------------------
>
> Key: HIVE-440
> URL: https://issues.apache.org/jira/browse/HIVE-440
> Project: Hadoop Hive
> Issue Type: Wish
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: hive.440.1.patch
>
>
> It is a good debugging tool to dump number of rows for a reducer, and to emit
> the number of entries for a join key for a internal table
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.