This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 4c1d94a Merge pull request #352 from
sebastian-nagel/NUTCH-2607-parsechecker-sc-passscoreafterparsing
add 26616f5 NUTCH-2547 urlnormalizer-basic fails on special characters in
path/query NUTCH-2609 urlnormalizer-basic to normalize path of file: URLs -
escape more special characters - escape percent when not followed by a valid
escape sequence (two-digit hex number) - escape special characters before
normalizing the path so that URI.normalize() can be used on valid URIs - also
normalize path '/..' - normalize path on file: URLs - complete unit tests
new a4b4bf6 Merge pull request #353 from
sebastian-nagel/nutch-2547-2609-url-normalizer-basic
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.../urlnormalizer/basic/BasicURLNormalizer.java | 108 +++++++++++++++++----
.../basic/TestBasicURLNormalizer.java | 43 ++++++++
2 files changed, 134 insertions(+), 17 deletions(-)