Natalia Escalera created TIKA-3765:
--------------------------------------
Summary: setExtractAnnotationText setting ignored after v2.3.0
Key: TIKA-3765
URL: https://issues.apache.org/jira/browse/TIKA-3765
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 2.4.0, 2.3.0
Reporter: Natalia Escalera
Prior to version 2.3.0, setting the PDFParserConfig setExtractAnnotationText to
false ignored the image annotations in a power point document.
This is no longer the case in Tika >= 2.3.0:
{code:java}
content == "Test Test 1 "
| |
| false
| 11 differences (52% similarity)
| Test Test 1 (image1.jpg )
| Test Test 1 (-----------)
Test Test 1 image1.jpg {code}
This issue can be easily reproduced by creating a pptx document with an image
and calling tika with setExtractAnnotationText(false).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)