garydgregory commented on code in PR #776:
URL: https://github.com/apache/commons-io/pull/776#discussion_r2328895181
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -221,6 +221,21 @@ public class IOUtils {
*/
private static final char[] SCRATCH_CHAR_BUFFER_WO = charArray();
+ /**
+ * The maximum size of an array in many Java VMs.
+ */
+ private static final int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
Review Comment:
Java 12's `ArraySupport` class calls this `SOFT_MAX_ARRAY_LENGTH`, so I
think we should use the same name.
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2652,65 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
+ * @throws IOException If an I/O error occurs while reading.
+ * @throws NullPointerException If {@code inputStream} is {@code null}.
*/
public static byte[] toByteArray(final InputStream inputStream) throws
IOException {
- // We use a ThresholdingOutputStream to avoid reading AND writing more
than Integer.MAX_VALUE.
- try (UnsynchronizedByteArrayOutputStream ubaOutput =
UnsynchronizedByteArrayOutputStream.builder().get();
- ThresholdingOutputStream thresholdOutput = new
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
- throw new IllegalArgumentException(String.format("Cannot
read more than %,d into a byte array", Integer.MAX_VALUE));
- }, os -> ubaOutput)) {
- copy(inputStream, thresholdOutput);
- return ubaOutput.toByteArray();
+ final UnsynchronizedByteArrayOutputStream output =
+ copyToOutputStream(inputStream, MAX_ARRAY_LENGTH + 1,
DEFAULT_CHUNK_SIZE);
+ if (output.size() > MAX_ARRAY_LENGTH) {
+ throw new IllegalArgumentException(
+ String.format("Cannot read more than %,d into a byte
array", MAX_ARRAY_LENGTH));
}
+ return output.toByteArray();
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use
this method instead of
- * {@link #toByteArray(InputStream)} when {@link InputStream} size is
known.
+ * Reads exactly {@code size} bytes from the given {@link InputStream}
into a new {@code byte[]}.
*
- * @param input the {@link InputStream} to read.
- * @param size the size of {@link InputStream} to read, where 0 <
{@code size} <= length of input stream.
- * @return byte [] of length {@code size}.
- * @throws IOException if an I/O error occurs or {@link InputStream}
length is smaller than parameter {@code size}.
- * @throws IllegalArgumentException if {@code size} is less than zero.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read and limited by the specified {@code size}. This makes it
suitable for
+ * processing large input streams, provided that
<strong>sufficient</strong> heap space is
+ * available.</p>
+ *
+ * @param input the {@link InputStream} to read; must not be {@code null}.
+ * @param size the exact number of bytes to read; must be {@code >= 0}.
+ * @return a new byte array of length {@code size}.
+ * @throws IllegalArgumentException if {@code size} is negative.
+ * @throws EOFException if the stream ends before {@code size}
bytes are read.
+ * @throws IOException if an I/O error occurs while reading.
+ * @throws NullPointerException if {@code input} is {@code null}.
* @since 2.1
*/
- @SuppressWarnings("resource")
public static byte[] toByteArray(final InputStream input, final int size)
throws IOException {
- return toByteArray(Objects.requireNonNull(input, "input")::read, size);
+ return toByteArray(input, size, DEFAULT_CHUNK_SIZE);
Review Comment:
Woa, this is surprising; now we are double-buffering and slowing down
existing call sites that don't want this new feature? It seems to like we
should _only_ do the double buffering when a call site chooses to use the new
API.
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -221,6 +221,21 @@ public class IOUtils {
*/
private static final char[] SCRATCH_CHAR_BUFFER_WO = charArray();
+ /**
+ * The maximum size of an array in many Java VMs.
+ */
+ private static final int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
+
+ /*
+ * Default maximum chunk size used when copying large streams into a byte
array.
+ * <p>
+ * This value is somewhat arbitrary, currently aligned with the value used
by
+ * <a
href="https://github.com/python/cpython/blob/3.14/Lib/_pyio.py">Python</a>
Review Comment:
CPython is completely irrelevant here; it should not be mentioned, as we
certainly do not want to give the impression that we are aligning anything with
CPython. I also like to mention the value with `{@value}` in this kind of
Javadoc.
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2697,6 +2720,69 @@ public static byte[] toByteArray(final InputStream
input, final long size) throw
return toByteArray(input, (int) size);
}
+ /**
+ * Reads exactly {@code size} bytes from the given {@link InputStream}
into a new {@code byte[]}.
+ *
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read and limited by the specified {@code size}. This makes it
suitable for
+ * processing large input streams, provided that
<strong>sufficient</strong> heap space is
+ * available.</p>
+ *
+ * <p>This method processes the input stream in successive chunks of up to
+ * {@code chunkSize} bytes.</p>
+ *
+ * @param input the {@link InputStream} to read; must not be {@code
null}.
+ * @param size the exact number of bytes to read; must be {@code >=
0}.
+ * The actual bytes read are validated to equal {@code
size}.
+ * @param chunkSize The chunk size for incremental reading; must be
{@code > 0}.
+ * @return a new byte array of length {@code size}.
+ * @throws IllegalArgumentException if {@code size} is negative or {@code
chunkSize <= 0}.
+ * @throws EOFException if the stream ends before {@code size}
bytes are read.
+ * @throws IOException if an I/O error occurs while reading.
+ * @throws NullPointerException if {@code input} is {@code null}.
+ * @since 2.21.0
+ */
+ public static byte[] toByteArray(final InputStream input, final int size,
final int chunkSize) throws IOException {
+ Objects.requireNonNull(input, "input");
+ if (chunkSize <= 0) {
+ throw new IllegalArgumentException("Chunk size must be greater
than zero: " + chunkSize);
+ }
+ if (size <= chunkSize) {
+ // throws if size < 0
+ return toByteArray(input::read, size);
+ }
+ final UnsynchronizedByteArrayOutputStream output =
copyToOutputStream(input, size, chunkSize);
+ if (output.size() != size) {
+ throw new EOFException("Unexpected read size, current: " +
output.size() + ", expected: " + size);
+ }
+ return output.toByteArray();
+ }
+
+ /**
+ * Copies up to {@code size} bytes from the given {@link InputStream} into
a new {@link UnsynchronizedByteArrayOutputStream}.
+ *
Review Comment:
Could you remove extra line?
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2652,65 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
+ * @throws IOException If an I/O error occurs while reading.
+ * @throws NullPointerException If {@code inputStream} is {@code null}.
*/
public static byte[] toByteArray(final InputStream inputStream) throws
IOException {
- // We use a ThresholdingOutputStream to avoid reading AND writing more
than Integer.MAX_VALUE.
- try (UnsynchronizedByteArrayOutputStream ubaOutput =
UnsynchronizedByteArrayOutputStream.builder().get();
- ThresholdingOutputStream thresholdOutput = new
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
- throw new IllegalArgumentException(String.format("Cannot
read more than %,d into a byte array", Integer.MAX_VALUE));
- }, os -> ubaOutput)) {
- copy(inputStream, thresholdOutput);
- return ubaOutput.toByteArray();
+ final UnsynchronizedByteArrayOutputStream output =
+ copyToOutputStream(inputStream, MAX_ARRAY_LENGTH + 1,
DEFAULT_CHUNK_SIZE);
Review Comment:
You can use line lengths up to 160 per our Checkstyle, so the code doesn't
end up reading like a newspaper 😉
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]