[
https://issues.apache.org/jira/browse/FILEUPLOAD-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
DC Posch updated FILEUPLOAD-305:
--------------------------------
Description:
h3. *Summary*
FileItem.get() claims to return "the contents of the file item as an array of
bytes."
FileItem.getInputStream() claims to return "an InputStream that can be used to
retrieve the contents of the file." That suggests they should behave
identically.
*However, when uploading a multipart form-encoded binary file, get() returns
garbled results.* Specifically, many byte sequences are replaced with 0xEF 0xBF
0xBD, the UTF-8 representation of the Unicode replacement character �.
This suggests that get() is attempting to de-serialize file contents as UTF-8
text, rather than returning the raw contents. This is a trap and does not match
the documentation.
Meanwhile, getInputStream() yields correct results.
h3. *Steps to reproduce*
Upload a multipart form-encoded payload to the following "hello world" request
handler.
Include just a single part, with the 208 byte PNG file attached to this issue.
Minimal request handler:
{{}}
{code:java}
private final void receive(HttpServletRequest req, HttpServletResponse resp)
throws Exception {
DiskFileItemFactory factory = new DiskFileItemFactory();
ServletFileUpload upload = new ServletFileUpload(factory);
FileItem item = upload.parseRequest(req).get(0);
System.out.println("content-type: " + item.getContentType());
System.out.println("# of bytes via get(): " + item.get().length);
// System.out.println("# of bytes via getInputStream():" +
ByteStreams.toByteArray(item.getInputStream()));
}{code}
If you print bytes via get(), you'll see 348, which is incorrect.
If you print bytes via getInputStream(), you'll see the correct 208 bytes.
If you go further and print out the exact bytes returned by get() and view in a
hex editor, you'll see the 0xEF 0xBF 0xBD replacement character inserted in
many spots.
was:
*Summary*
FileItem.get() claims to return "the contents of the file item as an array of
bytes."
FileItem.getInputStream() claims to return "an InputStream that can be used to
retrieve the contents of the file." That suggests they should behave
identically.
*However, when uploading a multipart form-encoded binary file, get() returns
garbled results.* Specifically, many byte sequences are replaced with 0xEF 0xBF
0xBD, the UTF-8 representation of the Unicode replacement character �.
This suggests that get() is attempting to de-serialize file contents as UTF-8
text, rather than returning the raw contents. This is a trap and does not match
the documentation.
Meanwhile, getInputStream() yields correct results.
*Steps to reproduce*
Upload a multipart form-encoded payload to the following "hello world" request
handler.
Include just a single part, with the 208 byte PNG file attached to this issue.
Minimal request handler:
{{}}
{code:java}
private final void receive(HttpServletRequest req, HttpServletResponse resp)
throws Exception {
DiskFileItemFactory factory = new DiskFileItemFactory();
ServletFileUpload upload = new ServletFileUpload(factory);
FileItem item = upload.parseRequest(req).get(0);
System.out.println("content-type: " + item.getContentType());
System.out.println("# of bytes via get(): " + item.get().length);
// System.out.println("# of bytes via getInputStream():" +
ByteStreams.toByteArray(item.getInputStream()));
}{code}
If you print bytes via get(), you'll see 348, which is incorrect.
If you print bytes via getInputStream(), you'll see the correct 208 bytes.
If you go further and print out the exact bytes returned by get() and view in a
hex editor, you'll see the 0xEF 0xBF 0xBD replacement character inserted in
many spots.
> FileItem.get() returns garbled byte stream for binary files
> -----------------------------------------------------------
>
> Key: FILEUPLOAD-305
> URL: https://issues.apache.org/jira/browse/FILEUPLOAD-305
> Project: Commons FileUpload
> Issue Type: Bug
> Affects Versions: 1.4
> Environment: Server: Jetty 9.4.26
> JVM: openjdk 11.0.7
> OS: macOS Catalina
> Reporter: DC Posch
> Priority: Major
> Attachments: check.png
>
>
> h3. *Summary*
> FileItem.get() claims to return "the contents of the file item as an array of
> bytes."
> FileItem.getInputStream() claims to return "an InputStream that can be used
> to retrieve the contents of the file." That suggests they should behave
> identically.
> *However, when uploading a multipart form-encoded binary file, get() returns
> garbled results.* Specifically, many byte sequences are replaced with 0xEF
> 0xBF 0xBD, the UTF-8 representation of the Unicode replacement character �.
> This suggests that get() is attempting to de-serialize file contents as UTF-8
> text, rather than returning the raw contents. This is a trap and does not
> match the documentation.
> Meanwhile, getInputStream() yields correct results.
> h3. *Steps to reproduce*
> Upload a multipart form-encoded payload to the following "hello world"
> request handler.
> Include just a single part, with the 208 byte PNG file attached to this issue.
> Minimal request handler:
> {{}}
> {code:java}
> private final void receive(HttpServletRequest req, HttpServletResponse resp)
> throws Exception {
> DiskFileItemFactory factory = new DiskFileItemFactory();
> ServletFileUpload upload = new ServletFileUpload(factory);
> FileItem item = upload.parseRequest(req).get(0);
> System.out.println("content-type: " + item.getContentType());
> System.out.println("# of bytes via get(): " + item.get().length);
> // System.out.println("# of bytes via getInputStream():" +
> ByteStreams.toByteArray(item.getInputStream()));
> }{code}
> If you print bytes via get(), you'll see 348, which is incorrect.
> If you print bytes via getInputStream(), you'll see the correct 208 bytes.
> If you go further and print out the exact bytes returned by get() and view in
> a hex editor, you'll see the 0xEF 0xBF 0xBD replacement character inserted in
> many spots.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)