[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519749#comment-17519749 ]
Tim Allison commented on TIKA-3711: ----------------------------------- I'm going to address this in two commits. The first will add configurability for writing the file names to streams. The second commit will be a review of the offending commit: https://github.com/apache/tika/commit/118734a1317fa13ad66959fdc28969ca50a49643 -- I need to review cases where the calling parser has already written xhtml tags. > Image file names included in parsed Word Document text > ------------------------------------------------------ > > Key: TIKA-3711 > URL: https://issues.apache.org/jira/browse/TIKA-3711 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 2.3.0 > Reporter: Sam Stephens > Priority: Major > Attachments: word-doc-with-image-from-word-365.docx, > word-doc-with-image.docx > > > The attached Word document includes nothing but a single image. Running it > through the Tika 2.2.0 AutoDetectParser correctly returns null. Running it > through the Tika 2.3.0 AutoDetectParser returns the text: > {{image1.png}} > -- This message was sent by Atlassian Jira (v8.20.1#820001)