Tim Allison created TIKA-1376: --------------------------------- Summary: Improve embedded file name extraction in PDFParser Key: TIKA-1376 URL: https://issues.apache.org/jira/browse/TIKA-1376 Project: Tika Issue Type: Improvement Components: parser Reporter: Tim Allison Assignee: Tim Allison Priority: Trivial Fix For: 1.6
When we extract embedded files from PDFs, we are currently using the key in the PDEmbeddedFilesNameTreeNode as the file name that we store as the value of Metadata.RESOURCE_NAME_KEY in the embedded document's metadata. I think we should try to get the file name from PDComplexFileSpecification's getFilename() first. If that is null, then we should fall back to the key value. -- This message was sent by Atlassian JIRA (v6.2#6252)