[
https://issues.apache.org/jira/browse/TIKA-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383880#comment-16383880
]
Nick Burch commented on TIKA-2597:
----------------------------------
Trying to fully re-implement the Windows case-insensitivity rules doesn't sound
that much fun... Unless someone can find a small library / JRE system function
that does it for us?
Otherwise, Microsoft have been doing some work recently to fix various Windows
bugs and limitations around their case-sensitivity. You might find it easier to
just turn that on for your extraction directories! Details from a few days ago
at
https://blogs.msdn.microsoft.com/commandline/2018/02/28/per-directory-case-sensitivity-and-wsl/
> Attachment Extraction Case Sensitivity
> --------------------------------------
>
> Key: TIKA-2597
> URL: https://issues.apache.org/jira/browse/TIKA-2597
> Project: Tika
> Issue Type: Bug
> Components: app
> Affects Versions: 1.17
> Environment: windows
> Reporter: Todd Dixon
> Priority: Major
>
> Using the --extract option on a pdf with embedded files I am seeing that not
> all of the attachments are extracted. There are several files embedded that
> contain the same name. The names that are exactly the same are accounted for
> with a suffix of (1) etc. However when there is a similar name that is not
> the same case the parse does not account for changing the name with the
> suffix and thus overwrites the file on disk. Example
> FW Letter,.msg
> FW letter.msg
> Will result in only one attachment extracted. Would it be possible to update
> the filename comparison to account for windows file systems which see those
> two files as the same name?
> Thanks!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)