Marcos Gomez created NUTCH-3102:
-----------------------------------

             Summary: CrawlDbReader -stats fails with Cannot add NaN to t-digest
                 Key: NUTCH-3102
                 URL: https://issues.apache.org/jira/browse/NUTCH-3102
             Project: Nutch
          Issue Type: Bug
          Components: scoring
    Affects Versions: 1.19
            Reporter: Marcos Gomez


When running in local mode CrawlDbReader / readdb -stats fails with 
"java.lang.Exception: java.lang.IllegalArgumentException: Cannot add NaN to 
t-digest"

 
{noformat}
java.lang.Exception: java.lang.IllegalArgumentException: Cannot add NaN to 
t-digest
    at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) 
~[hadoop-mapreduce-client-common-3.3.4.jar:?]
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) 
~[hadoop-mapreduce-client-common-3.3.4.jar:?]
Caused by: java.lang.IllegalArgumentException: Cannot add NaN to t-digest
    at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:256) 
~[t-digest-3.3.jar:?]
    at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:246) 
~[t-digest-3.3.jar:?]
    at com.tdunning.math.stats.AbstractTDigest.add(AbstractTDigest.java:135) 
~[t-digest-3.3.jar:?]
    at 
org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatReducer.reduce(CrawlDbReader.java:489)
 ~[apache-nutch-1.19.jar:?]
    at 
org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatReducer.reduce(CrawlDbReader.java:422)
 ~[apache-nutch-1.19.jar:?]
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) 
~[hadoop-mapreduce-client-core-3.3.4.jar:?]
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) 
~[hadoop-mapreduce-client-core-3.3.4.jar:?]
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) 
~[hadoop-mapreduce-client-core-3.3.4.jar:?]
    at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
 ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 
~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]{noformat}
I added a log to know why it's happening, and apparently it's build the tdig 
with this value for a BytesWritable object:
{noformat}
Error adding scd value: 00 00 00 02 ff f8 00 00 00 00 00 00 ff f8 00 00 00 00 
00 00 42 c8 00 00 00 d2 04 1a 00 17 42 8e 00 00 ff c0 00 00 40 40 00 00 ff c0 
00 00 42 14 00 00 ff c0 00 00 42 60 00 00 ff c0 00 00 42 aa 00 00 ff c0 00 00 
43 a2 80 00 47 af 57 9b 45 7f d0 00 4a cf c0 db 43 7d 00 00 4d ac 61 02 45 72 
b0 00 4e 8d 9d bd 43 67 00 00 66 e1 9a 9c 45 aa 70 00 ff c0 00 00 45 72 40 00 
ff c0 00 00 45 d2 88 00 ff c0 00 00 46 10 10 00 ff c0 00 00 46 1f 7c 00 ff c0 
00 00 46 0b 98 00 ff c0 00 00 46 31 b8 00 ff c0 00 00 46 12 2c 00 ff c0 00 00 
45 cb 40 00 ff c0 00 00 45 6d f0 00 ff c0 00 00 45 78 70 00 ff c0 00 00 45 9e 
60 00 ff c0 00 00 45 94 d8 00 ff c0 00 00 00 00 00 00 00 00 00 00{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to