[
https://issues.apache.org/jira/browse/NIFI-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Gilman updated NIFI-16000:
-------------------------------
Status: Patch Available (was: Open)
> FileUtils.getSanitizedFilename rejects filenames containing spaces
> ------------------------------------------------------------------
>
> Key: NIFI-16000
> URL: https://issues.apache.org/jira/browse/NIFI-16000
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Matt Gilman
> Assignee: Matt Gilman
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> `org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the
> space character (code point `32`) as invalid and replaces it with an
> underscore. This list was originally derived from a cross-platform "invalid
> filename characters" reference, but the space character is legal on every
> major file system (NTFS, ext4, APFS, etc.).
> This becomes a usability problem because of how the method is consumed. Both
> `ConnectorResource` and `ParameterContextResource` use it as a strict
> validation gate for the asset name supplied in the `Filename` request header:
> {code:java}
> final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
> if (!assetName.equals(sanitizedAssetName)) {
> throw new IllegalArgumentException(FILENAME_HEADER + " header contains an
> invalid file name");
> }
> {code}
> The pattern is "sanitize, then reject if sanitization changed anything."
> Because any name containing a space is rewritten during sanitization, the
> equality check fails and the upload is rejected. As a result, common,
> perfectly valid filenames cannot be uploaded as assets. For example, a file
> produced by browser/OS download de-duplication such as {{driver (1).jar}} is
> sanitized to {{driver_(1).jar}}, which differs from the original and is
> therefore rejected with _"... header contains an invalid file name."_
> **Proposed change**
> Remove the space character (`32`) from the invalid-character set so spaces
> are preserved rather than replaced. Spaces are left exactly as supplied —
> including leading, trailing, repeated, and interior spaces — and no other
> normalization is performed. All other characters continue to be sanitized as
> before.
> **Examples (after change)**
> | Input | Output |
> | {{driver (1).jar}} | {{driver (1).jar}} |
> | {{my report.txt}} | {{my report.txt}} |
> | {{driver (1).jar}} | {{driver (1).jar}} |
> | {{a/b\c}} | {{a_b_c}} |
> | {{name:}} | {{name_}} |
> **Backward compatibility**
> The change is backward compatible: any filename that contained no spaces is
> sanitized exactly as before. The only behavioral change is that the space
> character is now preserved instead of being replaced with an underscore, so
> filenames whose sole issue was a space are now accepted by the asset-upload
> callers instead of being rejected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)