This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 0f46927  Merge pull request #476 from 
sebastian-nagel/NUTCH-2482-index-geoip-npe
     new 29865b2  NUTCH-2457 Embedded documents likely not correctly parsed by 
Tika - add unit test for embedded documents
     new 9c424f9  NUTCH-2457 Embedded documents likely not correctly parsed by 
Tika - remove needless unit test whether document to be tested is opened by 
parse-tika
     new c9238a1  NUTCH-2457 Embedded documents likely not correctly parsed by 
Tika - add AutoDetectParser to ParseContext, so that it is called   for 
embedded documents - if `tika.parse.embedded` is true   (false disables 
recursive parsing of embedded documents)
     new 9e49c3f  Merge pull request #474 from 
sebastian-nagel/NUTCH-2457-parse-tika-embedded-docs

The 2960 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/nutch-default.xml                             |   8 ++++++++
 src/plugin/parse-tika/build.xml                    |   1 +
 .../parse-tika/sample/test_recursive_embedded.docx | Bin 0 -> 27082 bytes
 .../org/apache/nutch/parse/tika/TikaParser.java    |   9 ++++++++-
 ...SWordParser.java => TestEmbeddedDocuments.java} |  22 ++++++---------------
 .../apache/nutch/parse/tika/TestMSWordParser.java  |   5 ++---
 .../org/apache/nutch/parse/tika/TestOOParser.java  |   2 +-
 .../org/apache/nutch/parse/tika/TestPdfParser.java |   3 +--
 .../org/apache/nutch/parse/tika/TestRTFParser.java |   3 +--
 9 files changed, 28 insertions(+), 25 deletions(-)
 create mode 100644 src/plugin/parse-tika/sample/test_recursive_embedded.docx
 copy 
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/{TestMSWordParser.java
 => TestEmbeddedDocuments.java} (78%)

Reply via email to