[ https://issues.apache.org/jira/browse/HIVE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835374#action_12835374 ]
He Yongqiang commented on HIVE-1178: ------------------------------------ Sorry. Reviewed it with Namit this morning offline. My comments for previous patch are: int totalFiles = 1; int numFiles = 1; if (numBuckets > maxReducers) { ... if (totalFiles % maxReducers == 0) { ... } else { numFiles = (totalFiles/maxReducers)+1 maxReducers = totalFiles/numFiles; } If numBuckets > maxReducer and is not a multiply, the code will try to find how many files need to be written for each reducer. And use that fileNumber to get a reducer number. Do we need to guarantee that the calculated reducer number multiply number of files in each reducer should be the bucket number? If so, it seems the above code can not guarantee that. For example (bucket number is 30, max reducer is 9), then numFiles will be 4, and maxReducer will be 7. And 4*7=28 != 30 The new patch uses a loop to find the good reducer number. > enforce bucketing for a table > ----------------------------- > > Key: HIVE-1178 > URL: https://issues.apache.org/jira/browse/HIVE-1178 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Namit Jain > Fix For: 0.6.0 > > Attachments: hive.1178.1.patch, hive.1178.2.patch > > > If the table being inserted is a bucketed, currently hive does not try to > enforce that. > An option should be added for checking that. > Moreover, the number of buckets can be higher than the number of maximum > reducers, in which > case a single reducer can write to multiple files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.