[jira] [Commented] (NUTCH-3102) CrawlDbReader -stats fails with Cannot add NaN to t-digest

Sebastian Nagel (Jira) Wed, 08 Jan 2025 05:01:21 -0800


    [ 
https://issues.apache.org/jira/browse/NUTCH-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911070#comment-17911070
 ]


Sebastian Nagel commented on NUTCH-3102:
----------------------------------------

Hi [~marcos], could you share the CrawlDb this error is happening? Ideally, a 
CrawlDb as small as possible. In addition, could you indicate on which system 
(OS, architecture, Java version, etc.) the issue was observed? Serialization 
issues are very difficult to reproduce and, hence, difficult to fix. Thanks!

> CrawlDbReader -stats fails with Cannot add NaN to t-digest
> ----------------------------------------------------------
>
>                 Key: NUTCH-3102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3102
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.19
>            Reporter: Marcos Gomez
>            Priority: Major
>             Fix For: 1.21
>
>
> When running in local mode CrawlDbReader / readdb -stats fails with 
> "java.lang.Exception: java.lang.IllegalArgumentException: Cannot add NaN to 
> t-digest"
>  
> {noformat}
> java.lang.Exception: java.lang.IllegalArgumentException: Cannot add NaN to 
> t-digest
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) 
> ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) 
> ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
> Caused by: java.lang.IllegalArgumentException: Cannot add NaN to t-digest
>     at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:256) 
> ~[t-digest-3.3.jar:?]
>     at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:246) 
> ~[t-digest-3.3.jar:?]
>     at com.tdunning.math.stats.AbstractTDigest.add(AbstractTDigest.java:135) 
> ~[t-digest-3.3.jar:?]
>     at 
> org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatReducer.reduce(CrawlDbReader.java:489)
>  ~[apache-nutch-1.19.jar:?]
>     at 
> org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatReducer.reduce(CrawlDbReader.java:422)
>  ~[apache-nutch-1.19.jar:?]
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) 
> ~[hadoop-mapreduce-client-core-3.3.4.jar:?]
>     at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) 
> ~[hadoop-mapreduce-client-core-3.3.4.jar:?]
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) 
> ~[hadoop-mapreduce-client-core-3.3.4.jar:?]
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>  ~[hadoop-mapreduce-client-common-3.3.4.jar:?]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]{noformat}
> I added a log to know why it's happening, and apparently it's build the tdig 
> with this value for a BytesWritable object:
> {noformat}
> Error adding scd value: 00 00 00 02 ff f8 00 00 00 00 00 00 ff f8 00 00 00 00 
> 00 00 42 c8 00 00 00 d2 04 1a 00 17 42 8e 00 00 ff c0 00 00 40 40 00 00 ff c0 
> 00 00 42 14 00 00 ff c0 00 00 42 60 00 00 ff c0 00 00 42 aa 00 00 ff c0 00 00 
> 43 a2 80 00 47 af 57 9b 45 7f d0 00 4a cf c0 db 43 7d 00 00 4d ac 61 02 45 72 
> b0 00 4e 8d 9d bd 43 67 00 00 66 e1 9a 9c 45 aa 70 00 ff c0 00 00 45 72 40 00 
> ff c0 00 00 45 d2 88 00 ff c0 00 00 46 10 10 00 ff c0 00 00 46 1f 7c 00 ff c0 
> 00 00 46 0b 98 00 ff c0 00 00 46 31 b8 00 ff c0 00 00 46 12 2c 00 ff c0 00 00 
> 45 cb 40 00 ff c0 00 00 45 6d f0 00 ff c0 00 00 45 78 70 00 ff c0 00 00 45 9e 
> 60 00 ff c0 00 00 45 94 d8 00 ff c0 00 00 00 00 00 00 00 00 00 00{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NUTCH-3102) CrawlDbReader -stats fails with Cannot add NaN to t-digest

Reply via email to