This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 0cec7b5 Merge pull request #335 from r0ann3l/NUTCH-2580 add 4f73c63 NUTCH-2583 Upgrading Nutch's dependencies - apply patch contributed by Ralf add 20ecad2 NUTCH-2584 Upgrade parse-tika to use Tika 1.18 add f5e3a30 NUTCH-2584 Upgrade parse-tika to use Tika 1.18 - fix failing unit tests - use Tika parser to get DOM tree of test documents - fix HTMLMetaProcessor to extract no-cache and base-href attributes on DOM tree modified by Tika - ignore links from FORM and SOURCE elements which are not extracted by Tika parser add 217e646 Add target "report" to view dependency tree of plugins add 107b364 NUTCH-2589 HTML redirections are not followed when using parse-tika - extract meta-refresh redirects from DOM tree normalized by Tika - add unit test to check whether meta-refresh redirects are extracted and parse status holds the redirect target new 2544fad Merge pull request #336 from sebastian-nagel/NUTCH-2583-upgrade-dependencies The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: ivy/ivy.xml | 67 +++++------ src/plugin/build-plugin.xml | 4 + src/plugin/parse-tika/build.xml | 15 +-- src/plugin/parse-tika/howto_upgrade_tika.txt | 16 ++- src/plugin/parse-tika/ivy.xml | 2 +- src/plugin/parse-tika/plugin.xml | 65 +++++++---- .../apache/nutch/parse/tika/HTMLMetaProcessor.java | 125 +++++++++++++-------- .../org/apache/nutch/parse/tika/TikaParser.java | 20 ++-- .../{ => parse}/tika/TestDOMContentUtils.java | 78 +++++++------ .../nutch/{ => parse}/tika/TestFeedParser.java | 2 +- .../nutch/{ => parse}/tika/TestHtmlParser.java | 2 +- .../nutch/{ => parse}/tika/TestImageMetadata.java | 2 +- .../nutch/{ => parse}/tika/TestMSWordParser.java | 2 +- .../nutch/{ => parse}/tika/TestOOParser.java | 2 +- .../nutch/{ => parse}/tika/TestPdfParser.java | 2 +- .../nutch/{ => parse}/tika/TestRTFParser.java | 2 +- .../{ => parse}/tika/TestRobotsMetaProcessor.java | 70 ++++++++---- 17 files changed, 276 insertions(+), 200 deletions(-) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestDOMContentUtils.java (89%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestFeedParser.java (99%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestHtmlParser.java (99%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestImageMetadata.java (98%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestMSWordParser.java (98%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestOOParser.java (98%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestPdfParser.java (98%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestRTFParser.java (98%) rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestRobotsMetaProcessor.java (68%) -- To stop receiving notification emails like this one, please contact sna...@apache.org.