[
https://issues.apache.org/jira/browse/HADOOP-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583476#comment-17583476
]
ASF GitHub Bot commented on HADOOP-18391:
-----------------------------------------
steveloughran commented on code in PR #4787:
URL: https://github.com/apache/hadoop/pull/4787#discussion_r952394117
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java:
##########
@@ -114,21 +116,37 @@ private static void
readNonByteBufferPositionedReadable(PositionedReadable strea
FileRange range,
ByteBuffer buffer)
throws IOException {
if (buffer.isDirect()) {
- buffer.put(readInDirectBuffer(stream, range));
+ readInDirectBuffer(stream, range.getLength(), buffer);
buffer.flip();
} else {
stream.readFully(range.getOffset(), buffer.array(),
buffer.arrayOffset(), range.getLength());
}
}
- private static byte[] readInDirectBuffer(PositionedReadable stream,
- FileRange range) throws IOException
{
- // if we need to read data from a direct buffer and the stream doesn't
- // support it, we allocate a byte array to use.
- byte[] tmp = new byte[range.getLength()];
- stream.readFully(range.getOffset(), tmp, 0, tmp.length);
- return tmp;
+ /**
+ * Read bytes from stream into a byte buffer using an
+ * intermediate byte array.
+ * @param stream input stream.
+ * @param length number of bytes to read.
+ * @param buffer buffer to fill.
+ * @throws IOException any IOE.
+ */
+ private static void readInDirectBuffer(PositionedReadable stream,
+ int length,
+ ByteBuffer buffer) throws IOException
{
+ int readBytes = 0;
+ int offset = 0;
+ byte[] tmp = new byte[TMP_BUFFER_MAX_SIZE];
Review Comment:
make a min of length, TMP_BUFFER_MAX_SIZE for more efficiency on small reads
> Improve VectoredReadUtils
> -------------------------
>
> Key: HADOOP-18391
> URL: https://issues.apache.org/jira/browse/HADOOP-18391
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 3.3.9
> Reporter: Steve Loughran
> Assignee: Mukund Thakur
> Priority: Major
> Labels: pull-request-available
>
> harden the VectoredReadUtils methods for consistent and more robust use,
> especially in those filesystems which don't have the api.
> VectoredReadUtils.readInDirectBuffer should allocate a max buffer size, .e.g
> 4mb, then do repeated reads and copies; this ensures that you don't OOM with
> many threads doing ranged requests. other libs do this.
> readVectored to call validateNonOverlappingAndReturnSortedRanges before
> iterating
> this ensures the abfs/s3a requirements are always met, and that because
> ranges will be read in order, prefetching by other clients will keep their
> performance good.
> readVectored to add special handling for 0 byte ranges
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]