[ https://issues.apache.org/jira/browse/HIVE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800445#action_12800445 ]
Ning Zhang commented on HIVE-964: --------------------------------- Some more comments: 1) RowContainer.java:134 and 207 can you define a enum in HiveConf and use that instead of the string here? 2) RowConainer.java:147 the if condition should always be true due to the assertion in line 144. So if should be removed. Also in setSerDe dummyRow doesn't need to be set here since it will be passed by the caller (e.g., CommonJoinOperator) who construct the dummy row and passed by add(). Please take a look at add() line 165. 3) please move variable declarations in 171-177 to the beginning of the class where most variables are declared and add a brief comment on each of them. 4) the firstCalled boolean should be cleared at add() otherwise the following situation may give wrong results: add, first, add, next, next. 5) in first(), the closeWriter(), closeReader() are called for each first(), this may cause bad performance when the RowContainer is iterated many times and there is no 6) InputFormat in line 204. It could be very expensive if the RowContainer is iterated many times 7) Can you rename the variable originalReadBlock to firstBlock, which is easier to understand.. 8) in nextBlock Writable val is a new instance of serde for every new block, can we reuse the serde? 9) key is inserted for each row as the first element before spillBlock and after nextBlock. This is too expensive given the row is an ArrayList. Zheng suggested to use UnionStructObjectInspector to handle key and value separately. > handle skewed keys for a join in a separate job > ----------------------------------------------- > > Key: HIVE-964 > URL: https://issues.apache.org/jira/browse/HIVE-964 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Namit Jain > Assignee: He Yongqiang > Attachments: hive-964-2009-12-17.txt, hive-964-2009-12-28-2.patch, > hive-964-2009-12-29-4.patch, hive-964-2010-01-08.patch, > hive-964-2010-01-13-2.patch > > > The skewed keys can be written to a temporary table or file, and a followup > conditional task can be used to perform the join on those keys. > As a first step, JDBM can be used for those keys -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.