[jira] Commented: (HIVE-1463) hive output file names are unnecessarily large

Ning Zhang (JIRA) Fri, 16 Jul 2010 13:01:47 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889290#action_12889290
 ]


Ning Zhang commented on HIVE-1463:
----------------------------------

A couple of questions:
 1) in Utilities.getTaskId(). The pattern looks like '.*_[mr]_0{0,5}" which 
works for Hadoop 0.20. But Hadoop uses different file patterns for different 
hadoop version + execution mode (local/MR). HIVE-1416 is trying to move this 
function to shims, but it has some diffs and not complete yet. 
 2) previously the file names are padded by 0s before the taskID. Previously 
task_00004 is before task_00033 and now task_4 is after task_33. This may 
introduce a problem in the bucketed joins where the files corresponding to the 
same bucket are joined together. If the two tables using different name 
conventions, there may be a mismatch. I think Namit and Yongqiang are more 
familiar with that. Can you two comment? 


> hive output file names are unnecessarily large
> ----------------------------------------------
>
>                 Key: HIVE-1463
>                 URL: https://issues.apache.org/jira/browse/HIVE-1463
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>         Attachments: hive-1463.1.patch
>
>
> Hive's output files are named like this:
> attempt_201006221843_431854_r_000000_0
> out of all of this goop - only one character '0' would have sufficed. we 
> should fix this. This would help environments with namenode memory 
> constraints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1463) hive output file names are unnecessarily large

Reply via email to