[jira] Updated: (HIVE-1463) hive output file names are unnecessarily large

Joydeep Sen Sarma (JIRA) Sat, 17 Jul 2010 02:43:22 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joydeep Sen Sarma updated HIVE-1463:
------------------------------------

    Attachment: 1463.3.patch

this fixes all the issues

1) regex expanded to cover both 17 and later releases. in 17 tasks are indeed 
named _map_ and _reduce_ in local mode.
2) no change to strip leading zeros in taskid. ordering of files will not be 
changed by this diff. the filename component being removed is constant per 
map-reduce job (jobid + jobtracker_id etc.).
3) one line env setting in the build file that allows us to control test 
execution logging from hive-log4j.

this passes all the tests. the problem with load_dyn_part2.q was due to 
incorrect regex application. the taskid matching has to be applied to the last 
component of the path name only.

as an aside - replaceTaskIdFromFilename would also be easier to understand and 
simpler if it simply did this (cut last component, replace taskid, concat back 
and return).

> hive output file names are unnecessarily large
> ----------------------------------------------
>
>                 Key: HIVE-1463
>                 URL: https://issues.apache.org/jira/browse/HIVE-1463
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1463.2.patch, 1463.3.patch, hive-1463.1.patch
>
>
> Hive's output files are named like this:
> attempt_201006221843_431854_r_000000_0
> out of all of this goop - only one character '0' would have sufficed. we 
> should fix this. This would help environments with namenode memory 
> constraints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1463) hive output file names are unnecessarily large

Reply via email to