Todd Dixon created TIKA-2597:
--------------------------------

             Summary: Attachment Extraction Case Sensitivity
                 Key: TIKA-2597
                 URL: https://issues.apache.org/jira/browse/TIKA-2597
             Project: Tika
          Issue Type: Bug
          Components: app
    Affects Versions: 1.17
         Environment: windows
            Reporter: Todd Dixon


Using the --extract option on a pdf with embedded files I am seeing that not 
all of the attachments are extracted.  There are several files embedded that 
contain the same name.  The names that are exactly the same are accounted for 
with a suffix of (1) etc.  However when there is a similar name that is not the 
same case the parse does not account for changing the name with the suffix and 
thus overwrites the file on disk.  Example
FW Letter,.msg
FW letter.msg

Will result in only one attachment extracted.  Would it be possible to update 
the filename comparison to account for windows file systems which see those two 
files as the same name?

Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to