This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from ff800c5  Merge pull request #705 from sebastian-nagel/NUTCH-2867
     new b0cbea5  NUTCH-2891 Upgrade to Tika 2.1.0 - upgrade Nutch core and the 
plugins   parse-tika and language-identifier - parse-tika uses on 
"tika-parsers-standard-package"   (no extended and scientific parsers) - 
disable Tesseract OCR in tika-config.xml
     new ad61dd1  NUTCH-2891 Upgrade to Tika 2.1.0 - re-enable 
language-identifier test
     new 621c884  NUTCH-2891 Upgrade to Tika 2.1.0 - remove commons-codec and 
commons-compress from exclusions   to enable parsing of 
application/x-7z-compressed files
     new 671f904  Merge pull request #700 from 
sebastian-nagel/NUTCH-2891-tika-2.1

The 3246 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/tika-config.xml.template                      |  16 ++-
 ivy/ivy.xml                                        |   6 +-
 src/java/org/apache/nutch/util/MimeUtil.java       |   3 +-
 src/plugin/language-identifier/ivy.xml             |   9 +-
 src/plugin/language-identifier/plugin.xml          |  11 ++
 .../nutch/analysis/lang/HTMLLanguageParser.java    |  54 +++++----
 .../analysis/lang/TestHTMLLanguageParser.java      |  77 ++++++-------
 src/plugin/parse-tika/howto_upgrade_tika.txt       |   6 +-
 src/plugin/parse-tika/ivy.xml                      |  27 ++---
 src/plugin/parse-tika/plugin.xml                   | 124 ++++++++-------------
 .../org/apache/nutch/parse/tika/TikaParser.java    |   6 +-
 .../org/apache/nutch/parse/tika/TestRTFParser.java |   4 +-
 12 files changed, 181 insertions(+), 162 deletions(-)

Reply via email to