[ https://issues.apache.org/jira/browse/HADOOP-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583476#comment-17583476 ]
ASF GitHub Bot commented on HADOOP-18391: ----------------------------------------- steveloughran commented on code in PR #4787: URL: https://github.com/apache/hadoop/pull/4787#discussion_r952394117 ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java: ########## @@ -114,21 +116,37 @@ private static void readNonByteBufferPositionedReadable(PositionedReadable strea FileRange range, ByteBuffer buffer) throws IOException { if (buffer.isDirect()) { - buffer.put(readInDirectBuffer(stream, range)); + readInDirectBuffer(stream, range.getLength(), buffer); buffer.flip(); } else { stream.readFully(range.getOffset(), buffer.array(), buffer.arrayOffset(), range.getLength()); } } - private static byte[] readInDirectBuffer(PositionedReadable stream, - FileRange range) throws IOException { - // if we need to read data from a direct buffer and the stream doesn't - // support it, we allocate a byte array to use. - byte[] tmp = new byte[range.getLength()]; - stream.readFully(range.getOffset(), tmp, 0, tmp.length); - return tmp; + /** + * Read bytes from stream into a byte buffer using an + * intermediate byte array. + * @param stream input stream. + * @param length number of bytes to read. + * @param buffer buffer to fill. + * @throws IOException any IOE. + */ + private static void readInDirectBuffer(PositionedReadable stream, + int length, + ByteBuffer buffer) throws IOException { + int readBytes = 0; + int offset = 0; + byte[] tmp = new byte[TMP_BUFFER_MAX_SIZE]; Review Comment: make a min of length, TMP_BUFFER_MAX_SIZE for more efficiency on small reads > Improve VectoredReadUtils > ------------------------- > > Key: HADOOP-18391 > URL: https://issues.apache.org/jira/browse/HADOOP-18391 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 3.3.9 > Reporter: Steve Loughran > Assignee: Mukund Thakur > Priority: Major > Labels: pull-request-available > > harden the VectoredReadUtils methods for consistent and more robust use, > especially in those filesystems which don't have the api. > VectoredReadUtils.readInDirectBuffer should allocate a max buffer size, .e.g > 4mb, then do repeated reads and copies; this ensures that you don't OOM with > many threads doing ranged requests. other libs do this. > readVectored to call validateNonOverlappingAndReturnSortedRanges before > iterating > this ensures the abfs/s3a requirements are always met, and that because > ranges will be read in order, prefetching by other clients will keep their > performance good. > readVectored to add special handling for 0 byte ranges -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org