DC Posch created FILEUPLOAD-305:
-----------------------------------
Summary: FileItem.get() returns garbled byte stream for binary
files
Key: FILEUPLOAD-305
URL: https://issues.apache.org/jira/browse/FILEUPLOAD-305
Project: Commons FileUpload
Issue Type: Bug
Affects Versions: 1.4
Environment: Server: Jetty 9.4.26
JVM: openjdk 11.0.7
OS: macOS Catalina
Reporter: DC Posch
Attachments: check.png
*Summary*
FileItem.get() claims to return "the contents of the file item as an array of
bytes."
FileItem.getInputStream() claims to return "an InputStream that can be used to
retrieve the contents of the file."
When uploading a multipart form-encoded binary file, get() returns garbled
results. Specifically, many byte sequences are replaced with 0xEF 0xBF 0xBD,
the UTF-8 representation of the Unicode replacement character �.
This suggests that get() is attempting to deserialize file contents as UTF-8
text, rather than returning the raw contents. This is a trap and does not match
the documentation.
Meanwhile, getInputStream() yields correct results.
*Steps to reproduce*
Upload a multipart form-encoded payload to the following "hello world" request
handler.
Include just a single part, with the 208 byte PNG file attached to this issue.
Minimal request handler:
{{}}
{code:java}
private final void receive(HttpServletRequest req, HttpServletResponse resp)
throws Exception {
DiskFileItemFactory factory = new DiskFileItemFactory();
ServletFileUpload upload = new ServletFileUpload(factory);
FileItem item = upload.parseRequest(req).get(0);
System.out.println("content-type: " + item.getContentType());
System.out.println("# of bytes via get(): " + item.get().length);
// System.out.println("# of bytes via getInputStream():" +
ByteStreams.toByteArray(item.getInputStream()));
}{code}
If you print bytes via get(), you'll see 348, which is incorrect.
If you print bytes via getInputStream(), you'll see the correct 208 bytes.
If you go further and print out the exact bytes returned by get() and view in a
hex editor, you'll see the 0xEF 0xBF 0xBD replacement character inserted in
many spots.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)