[jira] [Commented] (HADOOP-6109) Handle large (several MB) text input lines in a reasonable amount of time

Daniel Dai (Commented) (JIRA) Thu, 01 Dec 2011 18:03:03 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161360#comment-13161360
 ]


Daniel Dai commented on HADOOP-6109:
------------------------------------

Notice this may cause backward compatible issues, at least we find this issue 
in Pig code, where Pig use the getBytes() and ignore length. Is that possible 
to make getBytes() backward compatible, and add a getRawBytes() for those who 
want the performance and realize the risk?
                
> Handle large (several MB) text input lines in a reasonable amount of time
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-6109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6109
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.19.0
>         Environment: Linux 2.6 kernel, java 1.6 AMD Dual-Core Opteron 2.6GHz 
> with 1M L1/L2 cache 1.8G RAM
>            Reporter: thushara wijeratna
>            Assignee: thushara wijeratna
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-1234.patch, HADOOP-1234.patch
>
>
> problem:
> =======
> hadoop was timing out on a simple pass-through job (with the default 10 min 
> timeout)
> cause:
> =====
> i hunted this down to how Text lines are being processed inside 
> org.apache.hadoop.util.LineReader.
> i have a fix, a task that took more than 20 minutes and still failed to 
> complete, completes with this fix in under 30 s.
> i attach the patch (for trunk)
> the problem traces:
> ================
> hadoop version: 0.19.0
> userlogs on slave node:
> 2009-05-29 13:57:33,551 WARN org.apache.hadoop.mapred.TaskRunner: Parent 
> died.  Exiting attempt_200905281652_0013_m_000006_1
> [root@domU-12-31-38-01-7C-92 attempt_200905281652_0013_m_000006_1]#
> tellingly, the last input line processed right before this WARN is 19K. (i 
> log the full input line in the map function for debugging)
> output on map-reduce task:
> Task attempt_200905281652_0013_m_000006_2 failed to report status for 600 
> seconds. Killing!
> 09/05/29 14:08:01 INFO mapred.JobClient:  map 99% reduce 32%
> 09/05/29 14:18:05 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
>     at 
> com.adxpose.data.mr.DailyHeatmapAggregator.run(DailyHeatmapAggregator.java:547)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at 
> com.adxpose.data.mr.DailyHeatmapAggregator.main(DailyHeatmapAggregator.java:553)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-6109) Handle large (several MB) text input lines in a reasonable amount of time

Reply via email to