This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 0f46927 Merge pull request #476 from
sebastian-nagel/NUTCH-2482-index-geoip-npe
new 29865b2 NUTCH-2457 Embedded documents likely not correctly parsed by
Tika - add unit test for embedded documents
new 9c424f9 NUTCH-2457 Embedded documents likely not correctly parsed by
Tika - remove needless unit test whether document to be tested is opened by
parse-tika
new c9238a1 NUTCH-2457 Embedded documents likely not correctly parsed by
Tika - add AutoDetectParser to ParseContext, so that it is called for
embedded documents - if `tika.parse.embedded` is true (false disables
recursive parsing of embedded documents)
new 9e49c3f Merge pull request #474 from
sebastian-nagel/NUTCH-2457-parse-tika-embedded-docs
The 2960 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
conf/nutch-default.xml | 8 ++++++++
src/plugin/parse-tika/build.xml | 1 +
.../parse-tika/sample/test_recursive_embedded.docx | Bin 0 -> 27082 bytes
.../org/apache/nutch/parse/tika/TikaParser.java | 9 ++++++++-
...SWordParser.java => TestEmbeddedDocuments.java} | 22 ++++++---------------
.../apache/nutch/parse/tika/TestMSWordParser.java | 5 ++---
.../org/apache/nutch/parse/tika/TestOOParser.java | 2 +-
.../org/apache/nutch/parse/tika/TestPdfParser.java | 3 +--
.../org/apache/nutch/parse/tika/TestRTFParser.java | 3 +--
9 files changed, 28 insertions(+), 25 deletions(-)
create mode 100644 src/plugin/parse-tika/sample/test_recursive_embedded.docx
copy
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/{TestMSWordParser.java
=> TestEmbeddedDocuments.java} (78%)