[
https://issues.apache.org/jira/browse/NUTCH-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911046#comment-17911046
]
ASF GitHub Bot commented on NUTCH-3072:
---------------------------------------
sebastian-nagel merged PR #832:
URL: https://github.com/apache/nutch/pull/832
> Fetcher to stop QueueFeeder if aborting with "hung threads"
> -----------------------------------------------------------
>
> Key: NUTCH-3072
> URL: https://issues.apache.org/jira/browse/NUTCH-3072
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.20
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Major
> Fix For: 1.21
>
>
> Fetcher is shutting down if there is no progress (not a single URL fetched)
> during half of the MapReduce task timeout, see fetcher.thread.timeout.divisor
> and NUTCH-1057. Before the shut-down Fetcher reports the active
> FetcherThreads as "hung threads" and drops existing FetchQueues. After that
> the task continues with sorting and merging the spilled data. FetcherThreads
> and also the QueueFeeder might be still running in this moment, which opens
> potential concurrency issues when a FetcherThread writes output data while
> the output is already in the process of sorting.
> Fetcher should stop the QueueFeeder and/or make sure it isn't alive anymore.
> In addition, a short wait (one second) should help FetcherThreads to shut
> down.
> The issue was observed while testing a solution for NUTCH-3067.
> {noformat}
> 2024-10-02 09:33:18,796 INFO [main] fetcher.Fetcher: -activeThreads=120,
> spinWaiting=119, fetchQueues.totalSize=12000, fetchQueues.getQueueCount=9884
> 2024-10-02 09:33:18,797 WARN [main] fetcher.Fetcher: Aborting with 120 hung
> threads.
> ...
> 2024-10-02 09:33:18,828 WARN [main] fetcher.Fetcher: Aborting with 12000
> queued fetch items in 9884 queues (queue feeder still alive).
> 2024-10-02 09:33:18,828 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> ... (reporting dropped queues)
> 2024-10-02 09:33:18,903 INFO [main] fetcher.FetchItemQueues: Emptied all
> queues: 9279 queues with 12000 items
> 2024-10-02 09:33:18,906 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Starting flush of map
> output
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Spilling map output
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: bufstart = 124101175;
> bufend = 177094062; bufvoid = 314572800
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: kvstart =
> 31025288(124101152); kvend = 30983880(123935520); length = 41409/19660800
> 2024-10-02 09:33:18,907 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> ...
> 2024-10-02 09:33:19,292 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,294 INFO [main] mapred.Merger: Merging 4 sorted segments
> 2024-10-02 09:33:19,295 INFO [main] mapred.Merger: Down to the last
> merge-pass, with 4 segments left of total size: 979973 bytes
> 2024-10-02 09:33:19,296 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> ...
> 2024-10-02 09:33:19,478 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,480 DEBUG [QueueFeeder] fetcher.QueueFeeder: -feeding
> 12000 input urls ...
> 2024-10-02 09:33:19,480 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,481 DEBUG [FetcherThread] fetcher.FetcherThread:
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,481 ERROR [QueueFeeder] fetcher.QueueFeeder: QueueFeeder
> error reading input, record 89118
> java.io.IOException: Stream closed
> at
> org.apache.hadoop.io.compress.DecompressorStream.checkStream(DecompressorStream.java:184)
> ...
> at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:120)
> ...
> 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread:
> FetcherThread 1285 has no more work available
> 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread:
> FetcherThread 1285 -finishing thread FetcherThread, activeThreads=104
> 2024-10-02 09:33:19,507 INFO [main] mapred.Merger: Merging 4 sorted segments
> 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread:
> FetcherThread 319 has no more work available
> 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread:
> FetcherThread 319 -finishing thread FetcherThread, activeThreads=103
> 2024-10-02 09:33:19,509 INFO [main] mapred.Merger: Down to the last
> merge-pass, with 4 segments left of total size: 959873 bytes
> 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread:
> FetcherThread 1112 has no more work available
> 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread:
> FetcherThread 1112 -finishing thread FetcherThread, activeThreads=102
> ...
> {noformat}
> Later on, one of the reducer tasks failed when reading data. This caused the
> job to fail:
> {noformat}
> 2024-10-02 10:37:31,347 INFO mapreduce.Job: map 100% reduce 92%
> 2024-10-02 10:37:31,347 INFO mapreduce.Job: Task Id :
> attempt_1727715735602_0060_r_000007_1, Status : FAILED
> Error: java.lang.ArrayIndexOutOfBoundsException: Index 114 out of bounds for
> length 25
> at
> org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:49)
> ...
> 2024-10-02 11:08:34,260 ERROR fetcher.Fetcher: Fetcher:
> java.lang.RuntimeException: Fetcher job did not succeed, job id:
> job_1727715735602_0060, job status: FAILED, reason: Task failed
> task_1727715735602_0060_r_000007
> Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0
> killedReduces: 0
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)