[ 
https://issues.apache.org/jira/browse/HIVE-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701677#action_12701677
 ] 

Zheng Shao commented on HIVE-440:
---------------------------------

JoinOperator.java
{code}
+  private int getNextSize(int sz) {
+    // A very simple counter to keep track of join entres for a key
+    if ((sz % 128000) == 0)
+      return sz + 128000;
+    
+    return 2 * sz;
+  }
+
{code}

EmitInterval is configurable, so there is no way to make sure "(sz % 128000) == 
0" will happen. It's better to say "sz >= 128000" I think.
We might want to change the same logic in ExecReducer for precaution.

For ExecReducer, I would prefer to start the counting from 1 instead of 1000. 
It's 3 additional lines of log, but will give us some insights whether the 
operator is blocking on the first row, or map-reduce framework never send the 
first row to our Operators.


> no way to know number of rows for a reducer and join keys
> ---------------------------------------------------------
>
>                 Key: HIVE-440
>                 URL: https://issues.apache.org/jira/browse/HIVE-440
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.440.1.patch
>
>
> It is a good debugging tool to dump number of rows for a reducer, and to emit 
> the number of entries for a join key for a internal table

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to