[ https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889290#action_12889290 ]
Ning Zhang commented on HIVE-1463: ---------------------------------- A couple of questions: 1) in Utilities.getTaskId(). The pattern looks like '.*_[mr]_0{0,5}" which works for Hadoop 0.20. But Hadoop uses different file patterns for different hadoop version + execution mode (local/MR). HIVE-1416 is trying to move this function to shims, but it has some diffs and not complete yet. 2) previously the file names are padded by 0s before the taskID. Previously task_00004 is before task_00033 and now task_4 is after task_33. This may introduce a problem in the bucketed joins where the files corresponding to the same bucket are joined together. If the two tables using different name conventions, there may be a mismatch. I think Namit and Yongqiang are more familiar with that. Can you two comment? > hive output file names are unnecessarily large > ---------------------------------------------- > > Key: HIVE-1463 > URL: https://issues.apache.org/jira/browse/HIVE-1463 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Joydeep Sen Sarma > Attachments: hive-1463.1.patch > > > Hive's output files are named like this: > attempt_201006221843_431854_r_000000_0 > out of all of this goop - only one character '0' would have sufficed. we > should fix this. This would help environments with namenode memory > constraints. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.