Kshitij created TIKA-4057:
-----------------------------

             Summary: Skip Thumbnails from Metadata When Scanning PPTX files
                 Key: TIKA-4057
                 URL: https://issues.apache.org/jira/browse/TIKA-4057
             Project: Tika
          Issue Type: Wish
          Components: metadata, mime
    Affects Versions: 2.6.0
            Reporter: Kshitij


I am scanning Pptx using tika parser/core 2.6.0 version and using 
EmbeddedDocumentExtractor to verify if embedded images are present in pptx or 
not. It seems that metadata contains thumbnails with mime type as "image/jpeg". 
The key and value for thumbnail areĀ  "dc:title" and "/docProps/thumbnail.jpeg" 
respectively. So even if there is no embedded image in pptx file, result always 
shows "Embedded image present" due to thumbnails. Is there any way to introduce 
any parameter in officeParserConfig that will skip the thumbnails while parsing 
. Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to