[ 
https://issues.apache.org/jira/browse/NUTCH-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277440#comment-16277440
 ] 

ASF GitHub Bot commented on NUTCH-2470:
---------------------------------------

sebastian-nagel opened a new pull request #252: NUTCH-2470 CrawlDbReader -stats 
to show quantiles of score
URL: https://github.com/apache/nutch/pull/252
 
 
   - add quantiles:
   ```
   score quantile 0.01:    -0.002406878347319667
   score quantile 0.05:    0.0
   score quantile 0.1:     0.0
   score quantile 0.2:     0.0
   score quantile 0.25:    0.0
   score quantile 0.3:     0.0
   score quantile 0.4:     0.0
   score quantile 0.5:     0.0
   score quantile 0.6:     1.3519638927815378E-7
   score quantile 0.7:     1.7516442206153347E-5
   score quantile 0.75:    7.380095048071454E-5
   score quantile 0.8:     1.8289990560181772E-4
   score quantile 0.9:     7.733948471002675E-4
   score quantile 0.95:    0.00450346029134317
   score quantile 0.99:    0.2702074978807236
   min score:      -0.007237845566123724
   avg score:      0.010340072900048684
   max score:      70.0
   ```
   - improves precision of score min/max/average
     (represent score by float instead of long * 1000)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CrawlDbReader -stats to show quantiles of score
> -----------------------------------------------
>
>                 Key: NUTCH-2470
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2470
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.13
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.14
>
>
> The command "readdb -stats" shows for the CrawlDatum score min., max. and 
> average. Median and quartiles (quantiles, in general) would complete the 
> statistics to get an impression how scores are distributed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to