[
https://issues.apache.org/jira/browse/HADOOP-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-3131:
----------------------------------
Status: Open (was: Patch Available)
Matei, sorry I missed this piece the first time around:
{noformat}
+ for (Segment<K, V> s: segmentsToMerge) {
+ totalBytesProcessed += s.getPosition(); // Count initial bytes read
+ }
+ if (totalBytes != 0) {
+ mergeProgress.set(totalBytesProcessed * progPerByte);
+ } else {
+ mergeProgress.set(1.0f);
+ }
{noformat}
At best it reports progress slightly early (i.e. before the final merge begins)
and at worst it provides completely wrong progress value during the merging of
intermediate map-outputs since all output for all reduces is in a single file.
Hence {{s.getPosition}} is hopelessly off as a measure of merge progress... I
vote we just do away with that block.
> enabling BLOCK compression for map outputs breaks the reduce progress counters
> ------------------------------------------------------------------------------
>
> Key: HADOOP-3131
> URL: https://issues.apache.org/jira/browse/HADOOP-3131
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.17.1, 0.17.0, 0.17.2, 0.18.0, 0.19.0
> Reporter: Colin Evans
> Assignee: Matei Zaharia
> Fix For: 0.19.0
>
> Attachments: HADOOP-3131-v2.patch, HADOOP-3131-v3.patch,
> HADOOP-3131-v4.patch, HADOOP-3131-v5.patch, merge-progress-trunk.patch,
> merge-progress.patch, Picture 1.png
>
>
> Enabling map output compression and setting the compression type to BLOCK
> causes the progress counters during the reduce to go crazy and report
> progress counts over 100%.
> This is problematic for speculative execution because it thinks the tasks are
> doing fine.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.