ppkarwasz commented on code in PR #776: URL: https://github.com/apache/commons-io/pull/776#discussion_r2329690275
########## src/main/java/org/apache/commons/io/IOUtils.java: ########## @@ -2637,57 +2652,65 @@ public static BufferedReader toBufferedReader(final Reader reader, final int siz } /** - * Gets the contents of an {@link InputStream} as a {@code byte[]}. - * <p> - * This method buffers the input internally, so there is no need to use a {@link BufferedInputStream}. - * </p> + * Reads all the bytes from an input stream in a byte array. * - * @param inputStream the {@link InputStream} to read. - * @return the requested byte array. - * @throws NullPointerException if the InputStream is {@code null}. - * @throws IOException if an I/O error occurs or reading more than {@link Integer#MAX_VALUE} occurs. + * <p>The memory used by this method is <strong>proportional</strong> to the number + * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This makes it unsuitable for + * processing large input streams, unless sufficient heap space is available.</p> + * + * @param inputStream The {@link InputStream} to read; must not be {@code null}. + * @return A new byte array. + * @throws IllegalArgumentException If the size of the stream is greater than the maximum array size. + * @throws IOException If an I/O error occurs while reading. + * @throws NullPointerException If {@code inputStream} is {@code null}. */ public static byte[] toByteArray(final InputStream inputStream) throws IOException { - // We use a ThresholdingOutputStream to avoid reading AND writing more than Integer.MAX_VALUE. - try (UnsynchronizedByteArrayOutputStream ubaOutput = UnsynchronizedByteArrayOutputStream.builder().get(); - ThresholdingOutputStream thresholdOutput = new ThresholdingOutputStream(Integer.MAX_VALUE, os -> { - throw new IllegalArgumentException(String.format("Cannot read more than %,d into a byte array", Integer.MAX_VALUE)); - }, os -> ubaOutput)) { - copy(inputStream, thresholdOutput); - return ubaOutput.toByteArray(); + final UnsynchronizedByteArrayOutputStream output = + copyToOutputStream(inputStream, MAX_ARRAY_LENGTH + 1, DEFAULT_CHUNK_SIZE); + if (output.size() > MAX_ARRAY_LENGTH) { + throw new IllegalArgumentException( + String.format("Cannot read more than %,d into a byte array", MAX_ARRAY_LENGTH)); } + return output.toByteArray(); } /** - * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use this method instead of - * {@link #toByteArray(InputStream)} when {@link InputStream} size is known. + * Reads exactly {@code size} bytes from the given {@link InputStream} into a new {@code byte[]}. * - * @param input the {@link InputStream} to read. - * @param size the size of {@link InputStream} to read, where 0 < {@code size} <= length of input stream. - * @return byte [] of length {@code size}. - * @throws IOException if an I/O error occurs or {@link InputStream} length is smaller than parameter {@code size}. - * @throws IllegalArgumentException if {@code size} is less than zero. + * <p>The memory used by this method is <strong>proportional</strong> to the number + * of bytes read and limited by the specified {@code size}. This makes it suitable for + * processing large input streams, provided that <strong>sufficient</strong> heap space is + * available.</p> + * + * @param input the {@link InputStream} to read; must not be {@code null}. + * @param size the exact number of bytes to read; must be {@code >= 0}. + * @return a new byte array of length {@code size}. + * @throws IllegalArgumentException if {@code size} is negative. + * @throws EOFException if the stream ends before {@code size} bytes are read. + * @throws IOException if an I/O error occurs while reading. + * @throws NullPointerException if {@code input} is {@code null}. * @since 2.1 */ - @SuppressWarnings("resource") public static byte[] toByteArray(final InputStream input, final int size) throws IOException { - return toByteArray(Objects.requireNonNull(input, "input")::read, size); + return toByteArray(input, size, DEFAULT_CHUNK_SIZE); Review Comment: I aligned this method with the JDK methods [`InputStream.readNBytes()`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html#readNBytes(int)), which performs a similar task, but is only available since JDK 11. Our Javadoc in this method does not prevent us from making the switch: > Gets the contents of an `InputStream` as a byte[]. Use this method instead of `toByteArray(InputStream)` when `InputStream` size is known. I think that with the right value for the threshold, the change might be profitable for users, what about using `Math.max(in.available(), DEFAULT_CHUNK_SIZE)` as threshold? This should guarantee that loading from local sources will **not** double-buffer (`FileInputStream` reports the remaining bytes in the file as “available”). What do you think? Personally I would rather not rely on the new behavior in my projects, since there is not guarantee users don't downgrade Commons IO to `2.20.0`, so I am mostly interested in `toByteArray(InputStream, int, int)`, but we might consider improving this method too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org