[
https://issues.apache.org/jira/browse/NIFI-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Gilman updated NIFI-16000:
-------------------------------
Description:
`org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the
space character (code point `32`) as invalid and replaces it with an
underscore. This list was originally derived from a cross-platform "invalid
filename characters" reference, but the space character is legal on every major
file system (NTFS, ext4, APFS, etc.).
This becomes a usability problem because of how the method is consumed. Both
`ConnectorResource` and `ParameterContextResource` use it as a strict
validation gate for the asset name supplied in the `Filename` request header:
{code:java}
final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
if (!assetName.equals(sanitizedAssetName)) {
throw new IllegalArgumentException(FILENAME_HEADER + " header contains an
invalid file name");
}
{code}
The pattern is "sanitize, then reject if sanitization changed anything."
Because any name containing a space is rewritten during sanitization, the
equality check fails and the upload is rejected. As a result, common, perfectly
valid filenames cannot be uploaded as assets. For example, a file produced by
browser/OS download de-duplication such as {{driver (1).jar}} is sanitized to
{{driver_(1).jar}}, which differs from the original and is therefore rejected
with _"... header contains an invalid file name."_
**Proposed change**
Remove the space character (`32`) from the invalid-character set so spaces are
preserved rather than replaced. Spaces are left exactly as supplied — including
leading, trailing, repeated, and interior spaces — and no other normalization
is performed. All other characters continue to be sanitized as before.
**Examples (after change)**
| Input | Output |
| {{driver (1).jar}} | {{driver (1).jar}} |
| {{my report.txt}} | {{my report.txt}} |
| {{driver (1).jar}} | {{driver (1).jar}} |
| {{a/b\c}} | {{a_b_c}} |
| {{name:}} | {{name_}} |
**Backward compatibility**
The change is backward compatible: any filename that contained no spaces is
sanitized exactly as before. The only behavioral change is that the space
character is now preserved instead of being replaced with an underscore, so
filenames whose sole issue was a space are now accepted by the asset-upload
callers instead of being rejected.
was:
`org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the
space character (code point `32`) as invalid and replaces it with an
underscore. This list was originally derived from a cross-platform "invalid
filename characters" reference, but the space character is legal on every major
file system (NTFS, ext4, APFS, etc.).
This becomes a usability problem because of how the method is consumed. Both
`ConnectorResource` and `ParameterContextResource` use it as a strict
validation gate for the asset name supplied in the `Filename` request header:
{code:java}
final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
if (!assetName.equals(sanitizedAssetName)) {
throw new IllegalArgumentException(FILENAME_HEADER + " header contains an
invalid file name");
}
{code}
Because any name containing a space is rewritten during sanitization, the
equality check fails and the upload is rejected. As a result, common, perfectly
valid filenames cannot be uploaded as assets. For example, a file produced by
browser/OS download de-duplication such as {{driver (1).jar}} is sanitized to
{{driver_(1).jar}}, which differs from the original and is therefore rejected
with _"... header contains an invalid file name."_
**Proposed change**
Permit spaces within a filename while keeping the result canonical and
file-system-safe:
* Remove the space character (`32`) from the invalid-character set so interior
spaces are preserved.
* After the existing per-character replacement, normalize the result by
collapsing interior whitespace runs to a single space, stripping
leading/trailing whitespace, and removing trailing dots.
This preserves the existing "sanitize, then reject if the name changed"
contract at the call sites (a non-canonical name such as a leading/trailing
space or a trailing dot is still rejected), while allowing legitimate names
that merely contain interior spaces. It also avoids the ambiguous edge cases
that simply accepting spaces would introduce (leading/trailing spaces, repeated
spaces, trailing dots, and whitespace-only names — the latter of which can
collide on Windows, where trailing spaces/dots are silently stripped).
**Examples (after change)**
| Input | Output | Accepted by callers? |
| {{driver (1).jar}} | {{driver (1).jar}} | Yes |
| {{driver (1).jar}} (repeated spaces) | {{driver (1).jar}} | No
(non-canonical) |
| {{ driver (1).jar }} (leading/trailing) | {{driver (1).jar}} | No
(non-canonical) |
| {{report...}} (trailing dots) | {{report}} | No (non-canonical) |
| {{a/b\c}} | {{a_b_c}} | No (non-canonical) |
**Backward compatibility**
The change is backward compatible: names that previously sanitized cleanly
continue to do so, and the only behavioral change is that filenames whose sole
issue was an interior space are now accepted instead of being rewritten.
> FileUtils.getSanitizedFilename rejects filenames containing spaces
> ------------------------------------------------------------------
>
> Key: NIFI-16000
> URL: https://issues.apache.org/jira/browse/NIFI-16000
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Matt Gilman
> Assignee: Matt Gilman
> Priority: Major
>
> `org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the
> space character (code point `32`) as invalid and replaces it with an
> underscore. This list was originally derived from a cross-platform "invalid
> filename characters" reference, but the space character is legal on every
> major file system (NTFS, ext4, APFS, etc.).
> This becomes a usability problem because of how the method is consumed. Both
> `ConnectorResource` and `ParameterContextResource` use it as a strict
> validation gate for the asset name supplied in the `Filename` request header:
> {code:java}
> final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
> if (!assetName.equals(sanitizedAssetName)) {
> throw new IllegalArgumentException(FILENAME_HEADER + " header contains an
> invalid file name");
> }
> {code}
> The pattern is "sanitize, then reject if sanitization changed anything."
> Because any name containing a space is rewritten during sanitization, the
> equality check fails and the upload is rejected. As a result, common,
> perfectly valid filenames cannot be uploaded as assets. For example, a file
> produced by browser/OS download de-duplication such as {{driver (1).jar}} is
> sanitized to {{driver_(1).jar}}, which differs from the original and is
> therefore rejected with _"... header contains an invalid file name."_
> **Proposed change**
> Remove the space character (`32`) from the invalid-character set so spaces
> are preserved rather than replaced. Spaces are left exactly as supplied —
> including leading, trailing, repeated, and interior spaces — and no other
> normalization is performed. All other characters continue to be sanitized as
> before.
> **Examples (after change)**
> | Input | Output |
> | {{driver (1).jar}} | {{driver (1).jar}} |
> | {{my report.txt}} | {{my report.txt}} |
> | {{driver (1).jar}} | {{driver (1).jar}} |
> | {{a/b\c}} | {{a_b_c}} |
> | {{name:}} | {{name_}} |
> **Backward compatibility**
> The change is backward compatible: any filename that contained no spaces is
> sanitized exactly as before. The only behavioral change is that the space
> character is now preserved instead of being replaced with an underscore, so
> filenames whose sole issue was a space are now accepted by the asset-upload
> callers instead of being rejected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)