This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 45ce310  Merge pull request #257 from 
smartive/feat/indexer-elastic-rest-languages
     add 607e7d9  NUTCH-2478 HTML parser should resolve base URL <base 
href=...> - fix parse-html and parse-tika - add unit test for parse-html
     add 2aec79f  NUTCH-2478 HTML parser should resolve base URL <base 
href=...> - finally fix parse-tika:   - href attribute of base element dropped 
in DOM   - need to call tikamd.get("Content-Location") - port HTML parser test 
from parse-html to parse-tika - add method to DomUtil which prints 
DocumentFragment
     new d73f293  Merge pull request #263 from 
sebastian-nagel/nutch-2478-parser-resolve-base-url

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/java/org/apache/nutch/util/DomUtil.java        |  9 ++++++
 .../apache/nutch/parse/html/DOMContentUtils.java   |  7 ++---
 .../org/apache/nutch/parse/html/HtmlParser.java    | 12 ++++++--
 .../apache/nutch/parse/html/TestHtmlParser.java    | 26 +++++++++++++++++-
 .../apache/nutch/parse/tika/DOMContentUtils.java   |  7 ++---
 .../org/apache/nutch/parse/tika/TikaParser.java    | 15 ++++++++--
 .../org/apache/nutch/tika}/TestHtmlParser.java     | 32 +++++++++++++++++++---
 7 files changed, 88 insertions(+), 20 deletions(-)
 copy src/plugin/{parse-html/src/test/org/apache/nutch/parse/html => 
parse-tika/src/test/org/apache/nutch/tika}/TestHtmlParser.java (81%)

-- 
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].

Reply via email to