This is an automated email from the ASF dual-hosted git repository.
tallison pushed a change to branch branch_3x
in repository https://gitbox.apache.org/repos/asf/tika.git
from 7d48f34719 TIKA-4488: update logback
new 82f26b63dc TIKA-4646 -- extract hyperlinks from instrText and other
areas in ooxml(#2578)
new 65bf98d3c5 TIKA-4646 -- extract hyperlinks from instrText and other
areas in ooxml(#2578) fix merge conflicts
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.../main/java/org/apache/tika/metadata/Office.java | 51 +++
.../org/apache/tika/parser/AutoDetectParser.java | 8 +-
.../microsoft/ooxml/AbstractOOXMLExtractor.java | 26 ++
.../microsoft/ooxml/FieldHyperlinkTracker.java | 168 +++++++++
.../microsoft/ooxml/OOXMLTikaBodyPartHandler.java | 25 ++
.../ooxml/OOXMLWordAndPowerPointTextHandler.java | 187 +++++++++-
.../ooxml/SXWPFWordExtractorDecorator.java | 179 +++++++++-
.../ooxml/XSSFExcelExtractorDecorator.java | 390 +++++++++++++++++++++
.../ooxml/XWPFWordExtractorDecorator.java | 95 ++++-
.../xslf/XSLFEventBasedPowerPointExtractor.java | 5 +
.../ooxml/xwpf/XWPFEventBasedWordExtractor.java | 5 +
.../tika/parser/microsoft/ExcelParserTest.java | 43 +++
.../parser/microsoft/ooxml/OOXMLParserTest.java | 39 +++
.../parser/microsoft/ooxml/SXWPFExtractorTest.java | 109 ++++++
.../parser/microsoft/pst/OutlookPSTParserTest.java | 3 +
.../test-documents/testAttachedTemplate.docx | Bin 0 -> 2284 bytes
.../test-documents/testDataConnections.xlsx | Bin 0 -> 2967 bytes
.../test/resources/test-documents/testDdeLink.xlsx | Bin 0 -> 3030 bytes
.../resources/test-documents/testExternalRefs.docx | Bin 0 -> 2125 bytes
.../resources/test-documents/testFrameset.docx | Bin 0 -> 2328 bytes
.../resources/test-documents/testHoverAndVml.docx | Bin 0 -> 2270 bytes
.../resources/test-documents/testInstrLink.docx | Bin 0 -> 14464 bytes
.../resources/test-documents/testMailMerge.docx | Bin 0 -> 2306 bytes
.../resources/test-documents/testSubdocument.docx | Bin 0 -> 1980 bytes
24 files changed, 1323 insertions(+), 10 deletions(-)
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/FieldHyperlinkTracker.java
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testAttachedTemplate.docx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testDataConnections.xlsx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testDdeLink.xlsx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testExternalRefs.docx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testFrameset.docx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testHoverAndVml.docx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testInstrLink.docx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testMailMerge.docx
create mode 100644
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testSubdocument.docx