This is an automated email from the ASF dual-hosted git repository.
tallison pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git.
from c11cab6 upgrade jackcess
new 8028a00 improve robustness of image processing in PDFs
new 3096f3f fix unit test to handle counts w and w/out tesseract
new cba0372 TIKA-3316 -- improve XPS parser to include open XPS and allow
for streaming zips with data descriptors
The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.../detect/microsoft/ooxml/OPCPackageDetector.java | 8 +-
.../microsoft/ooxml/OOXMLExtractorFactory.java | 14 ++-
.../microsoft/ooxml/xps/XPSExtractorDecorator.java | 5 ++
.../parser/microsoft/ooxml/xps/XPSParserTest.java | 45 +++++++++-
.../test-documents/testXPSWithDataDescriptor.xps | Bin 0 -> 44523 bytes
.../test-documents/testXPSWithDataDescriptor2.xps | Bin 0 -> 51175 bytes
.../apache/tika/parser/pdf/AbstractPDF2XHTML.java | 25 ++++--
.../detect/zip/DefaultZipContainerDetector.java | 38 +++++++--
.../org/apache/tika/zip/utils/ZipSalvager.java | 95 +++++++++++++--------
.../tika/parser/microsoft/rtf/RTFParserTest.java | 5 +-
10 files changed, 181 insertions(+), 54 deletions(-)
create mode 100644
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testXPSWithDataDescriptor.xps
create mode 100644
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testXPSWithDataDescriptor2.xps