This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git
from f8577a0d7 NUTCH-3143 GitHub workflow does not run all unit tests (#890)
add 3fb806830 NUTCH-3110 Upgrade to Tika 3.1.0
add 76ced9b18 NUTCH-3110 Upgrade to Tika 3.1.0
add 713835b73 NUTCH-3110 Upgrade to Tika 3.2.3
new 3101a9e6f Merge pull request #887 from lewismc/NUTCH-3110
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
ivy/ivy.xml | 2 +-
src/plugin/language-identifier/ivy.xml | 2 +-
src/plugin/language-identifier/plugin.xml | 13 +-
src/plugin/parse-js/plugin.xml | 4 +-
src/plugin/parse-tika/howto_upgrade_tika.md | 37 ++--
src/plugin/parse-tika/ivy.xml | 10 +-
src/plugin/parse-tika/plugin.xml | 92 ++++++++-
.../apache/nutch/parse/tika/DOMContentUtils.java | 54 ++++-
.../org/apache/nutch/parse/tika/TikaParser.java | 58 +++---
.../nutch/parse/tika/TestBoilerpipeExtraction.java | 112 +++++++++++
.../nutch/parse/tika/TestEncodingDetection.java | 193 ++++++++++++++++++
.../apache/nutch/parse/tika/TestHtmlParser.java | 4 +-
.../parse/tika/TestLinkExtractionEdgeCases.java | 221 ++++++++++++++++++++
.../nutch/parse/tika/TestMetadataExtraction.java | 223 +++++++++++++++++++++
.../parse/tika/TestParserFailureHandling.java | 221 ++++++++++++++++++++
15 files changed, 1171 insertions(+), 75 deletions(-)
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestBoilerpipeExtraction.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestEncodingDetection.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestLinkExtractionEdgeCases.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMetadataExtraction.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestParserFailureHandling.java