[ 
https://issues.apache.org/jira/browse/NIFI-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Gilman updated NIFI-16000:
-------------------------------
    Status: Patch Available  (was: Open)

> FileUtils.getSanitizedFilename rejects filenames containing spaces
> ------------------------------------------------------------------
>
>                 Key: NIFI-16000
>                 URL: https://issues.apache.org/jira/browse/NIFI-16000
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Matt Gilman
>            Assignee: Matt Gilman
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> `org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the 
> space character (code point `32`) as invalid and replaces it with an 
> underscore. This list was originally derived from a cross-platform "invalid 
> filename characters" reference, but the space character is legal on every 
> major file system (NTFS, ext4, APFS, etc.).
> This becomes a usability problem because of how the method is consumed. Both 
> `ConnectorResource` and `ParameterContextResource` use it as a strict 
> validation gate for the asset name supplied in the `Filename` request header:
> {code:java}
> final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
> if (!assetName.equals(sanitizedAssetName)) {
>     throw new IllegalArgumentException(FILENAME_HEADER + " header contains an 
> invalid file name");
> }
> {code}
> The pattern is "sanitize, then reject if sanitization changed anything." 
> Because any name containing a space is rewritten during sanitization, the 
> equality check fails and the upload is rejected. As a result, common, 
> perfectly valid filenames cannot be uploaded as assets. For example, a file 
> produced by browser/OS download de-duplication such as {{driver (1).jar}} is 
> sanitized to {{driver_(1).jar}}, which differs from the original and is 
> therefore rejected with _"... header contains an invalid file name."_
> **Proposed change**
> Remove the space character (`32`) from the invalid-character set so spaces 
> are preserved rather than replaced. Spaces are left exactly as supplied — 
> including leading, trailing, repeated, and interior spaces — and no other 
> normalization is performed. All other characters continue to be sanitized as 
> before.
> **Examples (after change)**
> | Input | Output |
> | {{driver (1).jar}} | {{driver (1).jar}} |
> | {{my report.txt}} | {{my report.txt}} |
> | {{driver   (1).jar}} | {{driver   (1).jar}} |
> | {{a/b\c}} | {{a_b_c}} |
> | {{name:}} | {{name_}} |
> **Backward compatibility**
> The change is backward compatible: any filename that contained no spaces is 
> sanitized exactly as before. The only behavioral change is that the space 
> character is now preserved instead of being replaced with an underscore, so 
> filenames whose sole issue was a space are now accepted by the asset-upload 
> callers instead of being rejected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to