Re: [PR] feat: Add incremental `toByteArray` method [commons-io]

via GitHub Fri, 05 Sep 2025 12:34:16 -0700


garydgregory commented on code in PR #776:
URL: https://github.com/apache/commons-io/pull/776#discussion_r2325862574



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2638,61 @@ public static BufferedReader toBufferedReader(final 
Reader reader, final int siz
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}.
-     * <p>
-     * This method buffers the input internally, so there is no need to use a 
{@link BufferedInputStream}.
-     * </p>
+     * Reads all remaining bytes from the given {@link InputStream} into a new 
{@code byte[]}.
      *
-     * @param inputStream the {@link InputStream} to read.
-     * @return the requested byte array.
-     * @throws NullPointerException if the InputStream is {@code null}.
-     * @throws IOException          if an I/O error occurs or reading more 
than {@link Integer#MAX_VALUE} occurs.
+     * <p>The method accumulates the data in temporary buffers and returns a 
single array
+     * containing the entire contents once the end of the stream is 
reached.</p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @return a new byte array.
+     * @throws IllegalArgumentException if the size of the stream is greater 
than {@code Integer.MAX_VALUE}.
+     * @throws IOException          if an I/O error occurs while reading.
+     * @throws NullPointerException if {@code input} is {@code null}.
      */
-    public static byte[] toByteArray(final InputStream inputStream) throws 
IOException {
+    public static byte[] toByteArray(final InputStream input) throws 
IOException {
         // We use a ThresholdingOutputStream to avoid reading AND writing more 
than Integer.MAX_VALUE.
         try (UnsynchronizedByteArrayOutputStream ubaOutput = 
UnsynchronizedByteArrayOutputStream.builder().get();
                 ThresholdingOutputStream thresholdOutput = new 
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
                     throw new IllegalArgumentException(String.format("Cannot 
read more than %,d into a byte array", Integer.MAX_VALUE));
                 }, os -> ubaOutput)) {
-            copy(inputStream, thresholdOutput);
+            copy(input, thresholdOutput);
             return ubaOutput.toByteArray();
         }
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>The method allocates a single array of the requested size and fills 
it directly

Review Comment:
   This comment is not needed IMO, the user cannot affect the behavior by 
passing in different values. It's a low-level implementation detail.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2638,61 @@ public static BufferedReader toBufferedReader(final 
Reader reader, final int siz
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}.
-     * <p>
-     * This method buffers the input internally, so there is no need to use a 
{@link BufferedInputStream}.
-     * </p>
+     * Reads all remaining bytes from the given {@link InputStream} into a new 
{@code byte[]}.
      *
-     * @param inputStream the {@link InputStream} to read.
-     * @return the requested byte array.
-     * @throws NullPointerException if the InputStream is {@code null}.
-     * @throws IOException          if an I/O error occurs or reading more 
than {@link Integer#MAX_VALUE} occurs.
+     * <p>The method accumulates the data in temporary buffers and returns a 
single array

Review Comment:
   This comment is not needed IMO, the user cannot affect the behavior about it 
by passing in different values. It's a low-level implementation detail.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2638,61 @@ public static BufferedReader toBufferedReader(final 
Reader reader, final int siz
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}.
-     * <p>
-     * This method buffers the input internally, so there is no need to use a 
{@link BufferedInputStream}.
-     * </p>
+     * Reads all remaining bytes from the given {@link InputStream} into a new 
{@code byte[]}.
      *
-     * @param inputStream the {@link InputStream} to read.
-     * @return the requested byte array.
-     * @throws NullPointerException if the InputStream is {@code null}.
-     * @throws IOException          if an I/O error occurs or reading more 
than {@link Integer#MAX_VALUE} occurs.
+     * <p>The method accumulates the data in temporary buffers and returns a 
single array
+     * containing the entire contents once the end of the stream is 
reached.</p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @return a new byte array.
+     * @throws IllegalArgumentException if the size of the stream is greater 
than {@code Integer.MAX_VALUE}.
+     * @throws IOException          if an I/O error occurs while reading.
+     * @throws NullPointerException if {@code input} is {@code null}.
      */
-    public static byte[] toByteArray(final InputStream inputStream) throws 
IOException {
+    public static byte[] toByteArray(final InputStream input) throws 
IOException {
         // We use a ThresholdingOutputStream to avoid reading AND writing more 
than Integer.MAX_VALUE.
         try (UnsynchronizedByteArrayOutputStream ubaOutput = 
UnsynchronizedByteArrayOutputStream.builder().get();
                 ThresholdingOutputStream thresholdOutput = new 
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
                     throw new IllegalArgumentException(String.format("Cannot 
read more than %,d into a byte array", Integer.MAX_VALUE));
                 }, os -> ubaOutput)) {
-            copy(inputStream, thresholdOutput);
+            copy(input, thresholdOutput);
             return ubaOutput.toByteArray();
         }
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>The method allocates a single array of the requested size and fills 
it directly
+     * from the stream.</p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @param size  the exact number of bytes to read; must be {@code >= 0}.
+     * @return a new byte array of length {@code size}.
+     * @throws IllegalArgumentException if {@code size} is negative.
+     * @throws EOFException             if the stream ends before {@code size} 
bytes are read.
+     * @throws IOException              if an I/O error occurs while reading.
+     * @throws NullPointerException     if {@code input} is {@code null}.
      * @since 2.1
      */
-    @SuppressWarnings("resource")
     public static byte[] toByteArray(final InputStream input, final int size) 
throws IOException {
         return toByteArray(Objects.requireNonNull(input, "input")::read, size);
     }
 
     /**
-     * Gets contents of an {@link InputStream} as a {@code byte[]}.
-     * Use this method instead of {@link #toByteArray(InputStream)}
-     * when {@link InputStream} size is known.
-     * <strong>NOTE:</strong> the method checks that the length can safely be 
cast to an int without truncation
-     * before using {@link IOUtils#toByteArray(InputStream, int)} to read into 
the byte array.
-     * (Arrays can have no more than Integer.MAX_VALUE entries anyway.)
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= min(Integer.MAX_VALUE, length of input stream).
-     * @return byte [] the requested byte array, of length {@code size}.
-     * @throws IOException              if an I/O error occurs or {@link 
InputStream} length is less than {@code size}.
-     * @throws IllegalArgumentException if size is less than zero or size is 
greater than Integer.MAX_VALUE.
-     * @see IOUtils#toByteArray(InputStream, int)
+     * <p>The method allocates a single array of the requested size and fills 
it directly

Review Comment:
   This comment is not needed IMO, the user cannot affect the behavior by 
passing in different values. It's a low-level implementation detail.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2638,61 @@ public static BufferedReader toBufferedReader(final 
Reader reader, final int siz
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}.
-     * <p>
-     * This method buffers the input internally, so there is no need to use a 
{@link BufferedInputStream}.
-     * </p>
+     * Reads all remaining bytes from the given {@link InputStream} into a new 
{@code byte[]}.
      *
-     * @param inputStream the {@link InputStream} to read.
-     * @return the requested byte array.
-     * @throws NullPointerException if the InputStream is {@code null}.
-     * @throws IOException          if an I/O error occurs or reading more 
than {@link Integer#MAX_VALUE} occurs.
+     * <p>The method accumulates the data in temporary buffers and returns a 
single array
+     * containing the entire contents once the end of the stream is 
reached.</p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @return a new byte array.
+     * @throws IllegalArgumentException if the size of the stream is greater 
than {@code Integer.MAX_VALUE}.
+     * @throws IOException          if an I/O error occurs while reading.
+     * @throws NullPointerException if {@code input} is {@code null}.
      */
-    public static byte[] toByteArray(final InputStream inputStream) throws 
IOException {
+    public static byte[] toByteArray(final InputStream input) throws 
IOException {

Review Comment:
   Please leave the parameter name unchanged. We can normalize the other names 
later to "inputStream" for consistency.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2697,6 +2702,48 @@ public static byte[] toByteArray(final InputStream 
input, final long size) throw
         return toByteArray(input, (int) size);
     }
 
+    /**
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
+     *
+     * <p>The method accumulates the data in temporary buffers of size at most 
{@code bufferSize}

Review Comment:
   I think it would be nice to explain _why_ this method is useful, instead of 
_how_ it works. For example, "When reading from an untrusted input stream, you 
reduce the risk of an OOME by calling this API with...".



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2638,61 @@ public static BufferedReader toBufferedReader(final 
Reader reader, final int siz
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}.
-     * <p>
-     * This method buffers the input internally, so there is no need to use a 
{@link BufferedInputStream}.
-     * </p>
+     * Reads all remaining bytes from the given {@link InputStream} into a new 
{@code byte[]}.

Review Comment:
   What follows is just about the finer aspects of documenting methods, but 
it's still nice to talk about IMO:
   
   The text can be simpler and more direct, for example, 
`java.nio.file.Files.read(InputStream, int)` says "Reads all the bytes from an 
input stream." After all, the only kinds of bytes in an input stream are the 
"remaining" kind, and it only makes sense to return a "new" array. So I would 
say: "Reads all the bytes from an input stream in a byte array".
   
   As an aside, Java seems to have called out the "remaining" concept formally 
in one API name: "Iterator.forEachRemaining(...)", but that might have been to 
avoid overriding or shadowing the "forEach(...)".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] feat: Add incremental `toByteArray` method [commons-io]

Reply via email to