[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

Sebastian Nagel (Jira) Tue, 10 Dec 2019 03:03:48 -0800


    [ 
https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992437#comment-16992437
 ]


Sebastian Nagel commented on NUTCH-2756:
----------------------------------------

The killed container was one launched speculatively:
{noformat}2019-12-10 06:34:22,872 INFO [DefaultSpeculator background 
processing] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: 
DefaultSpeculator.addSpeculativeAttempt -- we are speculating 
task_1575911127307_0231_r_000001
2019-12-10 06:34:22,872 INFO [DefaultSpeculator background processing] 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: We launched 1 
speculations.  Sleeping 15000 milliseconds.
{noformat}
For a small cluster and only few tasks speculative execution makes little 
sense, you could disable it by setting the properties mapreduce.map.speculative 
and mapreduce.reduce.speculative to false.  However, speculative execution 
shouldn't lead to broken job output.

> Segment Part problem with HDFS on distibuted mode
> -------------------------------------------------
>
>                 Key: NUTCH-2756
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2756
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Lucas Pauchard
>            Priority: Major
>         Attachments: 0_byte_file_screenshot.PNG, hadoop-env.sh, 
> hdfs-site.xml, mapred-site.xml, syslog, yarn-env.sh, yarn-site.xml
>
>
> During the parsing, it happens sometimes that parts of the data on the HDFS 
> is missing after the parsing.
> When I take a look at our HDFS, I've got this file with 0 bytes (see 
> attachments).
> After that the CrawlDB complains about this specific (corrupted?) part:
> {panel:title=log_crawl}
> 2019-12-04 22:25:57,454 INFO mapreduce.Job: Task Id : 
> attempt_1575479127636_0047_m_000017_2, Status : FAILED
> Error: java.io.EOFException: 
> hdfs://jobmaster:9000/user/hadoop/crawlmultiokhttp/segment/20191204221308/crawl_parse/part-r-00004
>  not a SequenceFile
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1964)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1923)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1872)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1886)
>         at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
>         at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:560)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:798)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)~
> {panel}
> When I check the namenode logs, I don't see any error during the writing of 
> the segment part but one hour later, I've got the following log:
> {panel:title=log_namenode}
> 2019-12-04 23:23:13,750 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_attempt_1575479127636_0046_r_000004_1_1307945884_1, pending 
> creates: 2], 
> src=/user/hadoop/crawlmultiokhttp/segment/20191204221308/parse_data/part-r-00004/index
> 2019-12-04 23:23:13,750 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> /user/hadoop/crawlmultiokhttp/segment/20191204221308/parse_data/part-r-00004/index
>  closed.
> 2019-12-04 23:23:13,750 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_attempt_1575479127636_0046_r_000004_1_1307945884_1, pending 
> creates: 1], 
> src=/user/hadoop/crawlmultiokhttp/segment/20191204221308/crawl_parse/part-r-00004
> 2019-12-04 23:23:13,750 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> /user/hadoop/crawlmultiokhttp/segment/20191204221308/crawl_parse/part-r-00004 
> closed.
> {panel}
> This issue is hard to reproduce and I can't figure out what are the 
> preconditions. It seems that it just happens randomly.
> Maybe the problem is coming from a bad management when we close the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

Reply via email to