[
https://issues.apache.org/jira/browse/TIKA-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4387:
------------------------------
Description:
{{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension
contains only alphanumeric characters.
If a "file path" derives from an internal path in a pst, like so {{/Début du
fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}},
then the extension is {{.5"}}, which causes problems on Windows.
The problem happens when TemporaryResources goes to write a temp file and tries
to maintain the file extension based on the {{resourceName}} in the Metadata.
We should add a check that the extension contains only alphanumerics? Or
something?
was:
{{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension
contains only alphanumeric characters.
If a "file path" derives from an internal path in a pst, like so {{/Début du
fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}},
then the extension is {{.5"}}, which causes problems on Windows.
> Improve robustness of file extension parsing
> --------------------------------------------
>
> Key: TIKA-4387
> URL: https://issues.apache.org/jira/browse/TIKA-4387
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> {{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension
> contains only alphanumeric characters.
> If a "file path" derives from an internal path in a pst, like so {{/Début du
> fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}},
> then the extension is {{.5"}}, which causes problems on Windows.
> The problem happens when TemporaryResources goes to write a temp file and
> tries to maintain the file extension based on the {{resourceName}} in the
> Metadata.
> We should add a check that the extension contains only alphanumerics? Or
> something?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)