[jira] Commented: (HIVE-964) handle skewed keys for a join in a separate job

Ning Zhang (JIRA) Thu, 14 Jan 2010 16:13:17 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800445#action_12800445
 ]


Ning Zhang commented on HIVE-964:
---------------------------------

Some more comments:

1) RowContainer.java:134 and 207 can you define a enum in HiveConf and use that 
instead of the string here?
2) RowConainer.java:147 the if condition should always be true due to the 
assertion in line 144. So if should be removed. Also in setSerDe dummyRow 
doesn't need to be set here since it will be passed by the caller (e.g., 
CommonJoinOperator) who construct the dummy row and passed by add(). Please 
take a look at add() line 165.
3) please move variable declarations in 171-177 to the beginning of the class 
where most variables are declared and add a brief comment on each of them.
4) the firstCalled boolean should be cleared at add() otherwise the following 
situation may give wrong results: add, first, add, next, next. 
5) in first(), the closeWriter(), closeReader() are called for each first(), 
this may cause bad performance when the RowContainer is iterated many times and 
there is no 
6) InputFormat in line 204. It could be very expensive if the RowContainer is 
iterated many times
7) Can you rename the variable originalReadBlock to firstBlock, which is easier 
to understand.. 
8) in nextBlock Writable val is a new instance of serde for every new block, 
can we reuse the serde?
9) key is inserted for each row as the first element before spillBlock and 
after nextBlock. This is too expensive given the row is an ArrayList. Zheng 
suggested to use UnionStructObjectInspector to handle key and value separately. 


> handle skewed keys for a join in a separate job
> -----------------------------------------------
>
>                 Key: HIVE-964
>                 URL: https://issues.apache.org/jira/browse/HIVE-964
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>         Attachments: hive-964-2009-12-17.txt, hive-964-2009-12-28-2.patch, 
> hive-964-2009-12-29-4.patch, hive-964-2010-01-08.patch, 
> hive-964-2010-01-13-2.patch
>
>
> The skewed keys can be written to a temporary table or file, and a followup 
> conditional task can be used to perform the join on those keys.
> As a first step, JDBM can be used for those keys

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-964) handle skewed keys for a join in a separate job

Reply via email to