[
https://issues.apache.org/jira/browse/MAPREDUCE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730677#action_12730677
]
Ravi Gummadi commented on MAPREDUCE-743:
----------------------------------------
When compressed files are given as input to maps, the progress is not updated
because the size of the input file(uncompressed size) is considered as
Long.MAX_VALUE and thus the progress of map task with compressed file as input
is ignored because of very small value 1/Long.MAX_VALUE. Progress values seen
are of the order of 10^-17 to 10^-11.
I saw on the web
http://www.abeel.be/content/determine-uncompressed-size-gzip-file that says
that the last 4 bytes of gzipped file contain the uncompressed file size. But
this works only if the size is < 4GB.
Any thoughts on getting the uncompressed file size of compressed files(at
leaset for gzipped files) ?
> Progress of map phase in map task is not updated properly
> ---------------------------------------------------------
>
> Key: MAPREDUCE-743
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-743
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Affects Versions: 0.21.0
> Reporter: Ravi Gummadi
> Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: MR-743.patch, MR-743.v1.patch
>
>
> Progress of map phase in map task is not updated properly. The progress set
> by TrackedRecordReader and NewTrackingRecordReader should set the progress
> object of map phase. It was setting it as the progress of whole task and
> because of phases, this is not considered as part of map task progress.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.