This is an automated email from the ASF dual-hosted git repository.
tallison pushed a change to branch branch_1x
in repository https://gitbox.apache.org/repos/asf/tika.git.
from 7565e15 TIKA-3078 -- add configurability to GeoParser
new 18b6645 use byte buffers when reading the legacy OneNote 2007 files
(#314)
new cd9a891 improve file mangling
new 4cec35f turn @Ignore back on in TestCorruptedFiles.java
new 0f4d5de TIKA-3081 -- convert TikaInputStream's skip to the equivalent
of skipFully
new 73b26ef TIKA-3080 -- prevent infinite loop in CharsetMatch.getString
new f7f1be6 Improve TikaMemoryLimitException msg
new 333d990 avoid npe in MP4Parser
new bc93f5e TIKA-2572 -- review overly broad catches
new f9607f9 improve ICNSParser
new b826687 fix one note parser entry
new d2a4f2a prevent oss-index from failing the build -- turn on at
release time!
new a3fef30 TIKA-3085 -- switch to batch inserts in tika-eval
new 57193f5 improve mp3 parser
new 171f434 TIKA-3087 -- general upgrades for 1.24.1
The 14 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.../org/apache/tika/detect/AutoDetectReader.java | 2 +-
.../org/apache/tika/detect/XmlRootExtractor.java | 2 +
.../tika/exception/TikaMemoryLimitException.java | 10 ++
.../java/org/apache/tika/io/TikaInputStream.java | 24 +++-
.../org/apache/tika/parser/CompositeParser.java | 9 +-
.../java/org/apache/tika/utils/CharsetUtils.java | 9 +-
.../java/org/apache/tika/eval/db/JDBCUtil.java | 30 +++++
.../java/org/apache/tika/eval/io/DBWriter.java | 35 ++++--
tika-parent/pom.xml | 11 +-
tika-parsers/pom.xml | 8 +-
.../tika/parser/apple/AppleSingleFileParser.java | 6 +-
.../org/apache/tika/parser/crypto/TSDParser.java | 6 +-
.../tika/parser/html/HtmlEncodingDetector.java | 3 +-
.../org/apache/tika/parser/image/ICNSParser.java | 10 +-
.../apache/tika/parser/mbox/OutlookPSTParser.java | 10 +-
.../apache/tika/parser/microsoft/OfficeParser.java | 5 +-
.../onenote/OneNoteLegacyDumpStrings.java | 87 ++++++++-----
.../tika/parser/microsoft/onenote/OneNotePtr.java | 12 +-
.../microsoft/onenote/OneNoteTreeWalker.java | 3 +-
.../microsoft/ooxml/AbstractOOXMLExtractor.java | 9 +-
.../ooxml/SXWPFWordExtractorDecorator.java | 2 +
.../org/apache/tika/parser/mp3/ID3v2Frame.java | 6 +
.../java/org/apache/tika/parser/mp4/MP4Parser.java | 22 ++--
.../parser/ner/corenlp/CoreNLPNERecogniser.java | 2 +-
.../apache/tika/parser/ocr/TesseractOCRParser.java | 4 +-
.../tika/parser/pdf/ImageGraphicsEngine.java | 3 +-
.../parser/pkg/StreamingZipContainerDetector.java | 4 +-
.../apache/tika/parser/rtf/RTFEmbObjHandler.java | 4 +-
.../apache/tika/parser/rtf/RTFObjDataParser.java | 12 +-
.../org/apache/tika/parser/rtf/TextExtractor.java | 2 +-
.../org/apache/tika/parser/txt/CharsetMatch.java | 2 +-
.../tika/parser/txt/Icu4jEncodingDetector.java | 2 +-
.../tika/parser/txt/UniversalEncodingListener.java | 2 +-
.../services/org.apache.tika.parser.Parser | 1 -
.../java/org/apache/tika/TestCorruptedFiles.java | 140 ++++++++++++++++-----
.../tika/server/resource/UnpackerResource.java | 2 +-
36 files changed, 358 insertions(+), 143 deletions(-)