This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 45ce310 Merge pull request #257 from
smartive/feat/indexer-elastic-rest-languages
add 607e7d9 NUTCH-2478 HTML parser should resolve base URL <base
href=...> - fix parse-html and parse-tika - add unit test for parse-html
add 2aec79f NUTCH-2478 HTML parser should resolve base URL <base
href=...> - finally fix parse-tika: - href attribute of base element dropped
in DOM - need to call tikamd.get("Content-Location") - port HTML parser test
from parse-html to parse-tika - add method to DomUtil which prints
DocumentFragment
new d73f293 Merge pull request #263 from
sebastian-nagel/nutch-2478-parser-resolve-base-url
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
src/java/org/apache/nutch/util/DomUtil.java | 9 ++++++
.../apache/nutch/parse/html/DOMContentUtils.java | 7 ++---
.../org/apache/nutch/parse/html/HtmlParser.java | 12 ++++++--
.../apache/nutch/parse/html/TestHtmlParser.java | 26 +++++++++++++++++-
.../apache/nutch/parse/tika/DOMContentUtils.java | 7 ++---
.../org/apache/nutch/parse/tika/TikaParser.java | 15 ++++++++--
.../org/apache/nutch/tika}/TestHtmlParser.java | 32 +++++++++++++++++++---
7 files changed, 88 insertions(+), 20 deletions(-)
copy src/plugin/{parse-html/src/test/org/apache/nutch/parse/html =>
parse-tika/src/test/org/apache/nutch/tika}/TestHtmlParser.java (81%)
--
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].