[jira] [Commented] (NUTCH-3072) Fetcher to stop QueueFeeder if aborting with "hung threads"

ASF GitHub Bot (Jira) Wed, 08 Jan 2025 04:17:04 -0800


    [ 
https://issues.apache.org/jira/browse/NUTCH-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911046#comment-17911046
 ]


ASF GitHub Bot commented on NUTCH-3072:
---------------------------------------

sebastian-nagel merged PR #832:
URL: https://github.com/apache/nutch/pull/832




> Fetcher to stop QueueFeeder if aborting with "hung threads"
> -----------------------------------------------------------
>
>                 Key: NUTCH-3072
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3072
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.20
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.21
>
>
> Fetcher is shutting down if there is no progress (not a single URL fetched) 
> during half of the MapReduce task timeout, see fetcher.thread.timeout.divisor 
> and NUTCH-1057. Before the shut-down Fetcher reports the active 
> FetcherThreads as "hung threads" and drops existing FetchQueues. After that 
> the task continues with sorting and merging the spilled data. FetcherThreads 
> and also the QueueFeeder might be still running in this moment, which opens 
> potential concurrency issues when a FetcherThread writes output data while 
> the output is already in the process of sorting.
> Fetcher should stop the QueueFeeder and/or make sure it isn't alive anymore. 
> In addition, a short wait (one second) should help FetcherThreads to shut 
> down.
> The issue was observed while testing a solution for NUTCH-3067.
> {noformat}
> 2024-10-02 09:33:18,796 INFO [main] fetcher.Fetcher: -activeThreads=120, 
> spinWaiting=119, fetchQueues.totalSize=12000, fetchQueues.getQueueCount=9884
> 2024-10-02 09:33:18,797 WARN [main] fetcher.Fetcher: Aborting with 120 hung 
> threads.
> ...
> 2024-10-02 09:33:18,828 WARN [main] fetcher.Fetcher: Aborting with 12000 
> queued fetch items in 9884 queues (queue feeder still alive).
> 2024-10-02 09:33:18,828 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> ... (reporting dropped queues)
> 2024-10-02 09:33:18,903 INFO [main] fetcher.FetchItemQueues: Emptied all 
> queues: 9279 queues with 12000 items
> 2024-10-02 09:33:18,906 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Starting flush of map 
> output
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Spilling map output
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: bufstart = 124101175; 
> bufend = 177094062; bufvoid = 314572800
> 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: kvstart = 
> 31025288(124101152); kvend = 30983880(123935520); length = 41409/19660800
> 2024-10-02 09:33:18,907 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> ...
> 2024-10-02 09:33:19,292 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,294 INFO [main] mapred.Merger: Merging 4 sorted segments
> 2024-10-02 09:33:19,295 INFO [main] mapred.Merger: Down to the last 
> merge-pass, with 4 segments left of total size: 979973 bytes
> 2024-10-02 09:33:19,296 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> ...
> 2024-10-02 09:33:19,478 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,480 DEBUG [QueueFeeder] fetcher.QueueFeeder: -feeding 
> 12000 input urls ...
> 2024-10-02 09:33:19,480 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,481 DEBUG [FetcherThread] fetcher.FetcherThread: 
> FetcherThread spin-waiting ...
> 2024-10-02 09:33:19,481 ERROR [QueueFeeder] fetcher.QueueFeeder: QueueFeeder 
> error reading input, record 89118
> java.io.IOException: Stream closed
>         at 
> org.apache.hadoop.io.compress.DecompressorStream.checkStream(DecompressorStream.java:184)
>         ...
>         at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:120)
> ...
> 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: 
> FetcherThread 1285 has no more work available
> 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: 
> FetcherThread 1285 -finishing thread FetcherThread, activeThreads=104
> 2024-10-02 09:33:19,507 INFO [main] mapred.Merger: Merging 4 sorted segments
> 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: 
> FetcherThread 319 has no more work available
> 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: 
> FetcherThread 319 -finishing thread FetcherThread, activeThreads=103
> 2024-10-02 09:33:19,509 INFO [main] mapred.Merger: Down to the last 
> merge-pass, with 4 segments left of total size: 959873 bytes
> 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: 
> FetcherThread 1112 has no more work available
> 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: 
> FetcherThread 1112 -finishing thread FetcherThread, activeThreads=102
> ...
> {noformat}
> Later on, one of the reducer tasks failed when reading data. This caused the 
> job to fail:
> {noformat}
> 2024-10-02 10:37:31,347 INFO mapreduce.Job:  map 100% reduce 92%
> 2024-10-02 10:37:31,347 INFO mapreduce.Job: Task Id : 
> attempt_1727715735602_0060_r_000007_1, Status : FAILED
> Error: java.lang.ArrayIndexOutOfBoundsException: Index 114 out of bounds for 
> length 25
>         at 
> org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:49)
> ...
> 2024-10-02 11:08:34,260 ERROR fetcher.Fetcher: Fetcher: 
> java.lang.RuntimeException: Fetcher job did not succeed, job id: 
> job_1727715735602_0060, job status: FAILED, reason: Task failed 
> task_1727715735602_0060_r_000007
> Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 
> killedReduces: 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NUTCH-3072) Fetcher to stop QueueFeeder if aborting with "hung threads"

Reply via email to