[ https://issues.apache.org/jira/browse/NUTCH-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911087#comment-17911087 ]
Hudson commented on NUTCH-3072: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #184 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/184/]) NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads" (snagel: [https://github.com/apache/nutch/commit/f5b9ace76153dd97cd866cc9212f5c67f5a70004]) * (edit) src/java/org/apache/nutch/fetcher/FetchItemQueues.java * (edit) src/java/org/apache/nutch/fetcher/Fetcher.java * (edit) src/java/org/apache/nutch/fetcher/QueueFeeder.java NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads" (snagel: [https://github.com/apache/nutch/commit/c22616232398357ef1a192d23cc016709412e703]) * (edit) src/java/org/apache/nutch/fetcher/QueueFeeder.java > Fetcher to stop QueueFeeder if aborting with "hung threads" > ----------------------------------------------------------- > > Key: NUTCH-3072 > URL: https://issues.apache.org/jira/browse/NUTCH-3072 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.20 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.21 > > > Fetcher is shutting down if there is no progress (not a single URL fetched) > during half of the MapReduce task timeout, see fetcher.thread.timeout.divisor > and NUTCH-1057. Before the shut-down Fetcher reports the active > FetcherThreads as "hung threads" and drops existing FetchQueues. After that > the task continues with sorting and merging the spilled data. FetcherThreads > and also the QueueFeeder might be still running in this moment, which opens > potential concurrency issues when a FetcherThread writes output data while > the output is already in the process of sorting. > Fetcher should stop the QueueFeeder and/or make sure it isn't alive anymore. > In addition, a short wait (one second) should help FetcherThreads to shut > down. > The issue was observed while testing a solution for NUTCH-3067. > {noformat} > 2024-10-02 09:33:18,796 INFO [main] fetcher.Fetcher: -activeThreads=120, > spinWaiting=119, fetchQueues.totalSize=12000, fetchQueues.getQueueCount=9884 > 2024-10-02 09:33:18,797 WARN [main] fetcher.Fetcher: Aborting with 120 hung > threads. > ... > 2024-10-02 09:33:18,828 WARN [main] fetcher.Fetcher: Aborting with 12000 > queued fetch items in 9884 queues (queue feeder still alive). > 2024-10-02 09:33:18,828 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > ... (reporting dropped queues) > 2024-10-02 09:33:18,903 INFO [main] fetcher.FetchItemQueues: Emptied all > queues: 9279 queues with 12000 items > 2024-10-02 09:33:18,906 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Starting flush of map > output > 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Spilling map output > 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: bufstart = 124101175; > bufend = 177094062; bufvoid = 314572800 > 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: kvstart = > 31025288(124101152); kvend = 30983880(123935520); length = 41409/19660800 > 2024-10-02 09:33:18,907 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > ... > 2024-10-02 09:33:19,292 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > 2024-10-02 09:33:19,294 INFO [main] mapred.Merger: Merging 4 sorted segments > 2024-10-02 09:33:19,295 INFO [main] mapred.Merger: Down to the last > merge-pass, with 4 segments left of total size: 979973 bytes > 2024-10-02 09:33:19,296 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > ... > 2024-10-02 09:33:19,478 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > 2024-10-02 09:33:19,480 DEBUG [QueueFeeder] fetcher.QueueFeeder: -feeding > 12000 input urls ... > 2024-10-02 09:33:19,480 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > 2024-10-02 09:33:19,481 DEBUG [FetcherThread] fetcher.FetcherThread: > FetcherThread spin-waiting ... > 2024-10-02 09:33:19,481 ERROR [QueueFeeder] fetcher.QueueFeeder: QueueFeeder > error reading input, record 89118 > java.io.IOException: Stream closed > at > org.apache.hadoop.io.compress.DecompressorStream.checkStream(DecompressorStream.java:184) > ... > at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:120) > ... > 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: > FetcherThread 1285 has no more work available > 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: > FetcherThread 1285 -finishing thread FetcherThread, activeThreads=104 > 2024-10-02 09:33:19,507 INFO [main] mapred.Merger: Merging 4 sorted segments > 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: > FetcherThread 319 has no more work available > 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: > FetcherThread 319 -finishing thread FetcherThread, activeThreads=103 > 2024-10-02 09:33:19,509 INFO [main] mapred.Merger: Down to the last > merge-pass, with 4 segments left of total size: 959873 bytes > 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: > FetcherThread 1112 has no more work available > 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: > FetcherThread 1112 -finishing thread FetcherThread, activeThreads=102 > ... > {noformat} > Later on, one of the reducer tasks failed when reading data. This caused the > job to fail: > {noformat} > 2024-10-02 10:37:31,347 INFO mapreduce.Job: map 100% reduce 92% > 2024-10-02 10:37:31,347 INFO mapreduce.Job: Task Id : > attempt_1727715735602_0060_r_000007_1, Status : FAILED > Error: java.lang.ArrayIndexOutOfBoundsException: Index 114 out of bounds for > length 25 > at > org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:49) > ... > 2024-10-02 11:08:34,260 ERROR fetcher.Fetcher: Fetcher: > java.lang.RuntimeException: Fetcher job did not succeed, job id: > job_1727715735602_0060, job status: FAILED, reason: Task failed > task_1727715735602_0060_r_000007 > Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 > killedReduces: 0 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)