This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from cf183ad  Merge pull request #358 from sebastian-nagel/NUTCH-2071
     add 579a76b  NUTCH-1106 Options to skip url's based on length - add 
property db.max.outlink.length to limit length   of outlinks and redirects 
(default = 8192 characters) - add rule (not active) to 
regex-urlfilters.txt.template
     add 8d434b5  NUTCH-1106 Options to skip url's based on length - most 
browsers support URLs up to around 2048 characters - use this value for the 
rule in regex-urlfilter.txt - limit outlink length to 4096 characters to allow 
additional   characters removed during normalization (anchor, query args)
     new f263d91  Merge pull request #359 from 
sebastian-nagel/NUTCH-1106-max-outlink-length

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/nutch-default.xml                                 | 15 +++++++++++++++
 conf/regex-urlfilter.txt.template                      |  3 +++
 src/java/org/apache/nutch/fetcher/FetcherThread.java   | 12 ++++++++++--
 src/java/org/apache/nutch/parse/ParseOutputFormat.java |  8 +++++++-
 4 files changed, 35 insertions(+), 3 deletions(-)

Reply via email to