This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 873d7bf  Merge pull request #473 from 
sebastian-nagel/NUTCH-2381-text-prof-signature-lexicographic-sorting
     new f02c98e  NUTCH-2737 Generator: count and log reason of rejections 
during selection - add counters for rejections in Generator's SelectorMapper - 
parameterize log messages to simplify code
     new e46232d  NUTCH-2738 Generator: document property 
generate.restrict.status - add generate.restrict.status to nutch-default.xml - 
get status (byte) from status name in setConf()   to speed up comparison in 
SelectorMapper
     new 8d21260  Generator: fix logging of hostdb path
     new 35da06f  NUTCH-2737 Generator: count and log reason of rejections 
during selection - count rejections by `generate.max.count`   * number of hosts 
(resp. domains) affected   * number of URLs skipped total (for all hosts)
     new 44ded9b  Generator: apply formatting
     new 4d68c08  NUTCH-2740 Generator: generate.max.count overflow not logged
     new 2f310ae  Generator: improve description of crawl.gen.delay
     new a2762f0  Merge pull request #477 from 
sebastian-nagel/NUTCH-2737-generator-log-selection

The 2970 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/nutch-default.xml                          |  17 +-
 src/java/org/apache/nutch/crawl/CrawlDatum.java |   9 +
 src/java/org/apache/nutch/crawl/Generator.java  | 837 ++++++++++++------------
 3 files changed, 456 insertions(+), 407 deletions(-)

Reply via email to