[ 
https://issues.apache.org/jira/browse/NUTCH-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911385#comment-17911385
 ] 

Marcos Gomez commented on NUTCH-3102:
-------------------------------------

[~snagel] Unfortunately I can't provide you the CrawlDb, but Nutch (1.19) is 
running on Linux over ARM arch, using Java 11, it has been running flawlessly 
for months, but at some point starts to fails when getting the stats. The only 
way to I've found to fix it, has been wrap that line with a try-catch and 
ignore the exception due I don't need that info.

> CrawlDbReader -stats fails with Cannot add NaN to t-digest
> ----------------------------------------------------------
>
>                 Key: NUTCH-3102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3102
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.19
>            Reporter: Marcos Gomez
>            Priority: Major
>             Fix For: 1.21
>
>
> When running in local mode CrawlDbReader / readdb -stats fails with 
> "java.lang.Exception: java.lang.IllegalArgumentException: Cannot add NaN to 
> t-digest"
>  
> {noformat}
> java.lang.Exception: java.lang.IllegalArgumentException: Cannot add NaN to 
> t-digest
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) 
> ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) 
> ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
> Caused by: java.lang.IllegalArgumentException: Cannot add NaN to t-digest
>     at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:256) 
> ~[t-digest-3.3.jar:?]
>     at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:246) 
> ~[t-digest-3.3.jar:?]
>     at com.tdunning.math.stats.AbstractTDigest.add(AbstractTDigest.java:135) 
> ~[t-digest-3.3.jar:?]
>     at 
> org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatReducer.reduce(CrawlDbReader.java:489)
>  ~[apache-nutch-1.19.jar:?]
>     at 
> org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatReducer.reduce(CrawlDbReader.java:422)
>  ~[apache-nutch-1.19.jar:?]
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) 
> ~[hadoop-mapreduce-client-core-3.3.4.jar:?]
>     at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) 
> ~[hadoop-mapreduce-client-core-3.3.4.jar:?]
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) 
> ~[hadoop-mapreduce-client-core-3.3.4.jar:?]
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>  ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]{noformat}
> I added a log to know why it's happening, and apparently it's build the tdig 
> with this value for a BytesWritable object:
> {noformat}
> Error adding scd value: 00 00 00 02 ff f8 00 00 00 00 00 00 ff f8 00 00 00 00 
> 00 00 42 c8 00 00 00 d2 04 1a 00 17 42 8e 00 00 ff c0 00 00 40 40 00 00 ff c0 
> 00 00 42 14 00 00 ff c0 00 00 42 60 00 00 ff c0 00 00 42 aa 00 00 ff c0 00 00 
> 43 a2 80 00 47 af 57 9b 45 7f d0 00 4a cf c0 db 43 7d 00 00 4d ac 61 02 45 72 
> b0 00 4e 8d 9d bd 43 67 00 00 66 e1 9a 9c 45 aa 70 00 ff c0 00 00 45 72 40 00 
> ff c0 00 00 45 d2 88 00 ff c0 00 00 46 10 10 00 ff c0 00 00 46 1f 7c 00 ff c0 
> 00 00 46 0b 98 00 ff c0 00 00 46 31 b8 00 ff c0 00 00 46 12 2c 00 ff c0 00 00 
> 45 cb 40 00 ff c0 00 00 45 6d f0 00 ff c0 00 00 45 78 70 00 ff c0 00 00 45 9e 
> 60 00 ff c0 00 00 45 94 d8 00 ff c0 00 00 00 00 00 00 00 00 00 00{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to