[jira] [Commented] (HADOOP-18391) Improve VectoredReadUtils

ASF GitHub Bot (Jira) Tue, 23 Aug 2022 02:47:09 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583476#comment-17583476
 ]


ASF GitHub Bot commented on HADOOP-18391:
-----------------------------------------

steveloughran commented on code in PR #4787:
URL: https://github.com/apache/hadoop/pull/4787#discussion_r952394117


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java:
##########
@@ -114,21 +116,37 @@ private static void 
readNonByteBufferPositionedReadable(PositionedReadable strea
                                                           FileRange range,
                                                           ByteBuffer buffer) 
throws IOException {
     if (buffer.isDirect()) {
-      buffer.put(readInDirectBuffer(stream, range));
+      readInDirectBuffer(stream, range.getLength(), buffer);
       buffer.flip();
     } else {
       stream.readFully(range.getOffset(), buffer.array(),
               buffer.arrayOffset(), range.getLength());
     }
   }
 
-  private static byte[] readInDirectBuffer(PositionedReadable stream,
-                                           FileRange range) throws IOException 
{
-    // if we need to read data from a direct buffer and the stream doesn't
-    // support it, we allocate a byte array to use.
-    byte[] tmp = new byte[range.getLength()];
-    stream.readFully(range.getOffset(), tmp, 0, tmp.length);
-    return tmp;
+  /**
+   * Read bytes from stream into a byte buffer using an
+   * intermediate byte array.
+   * @param stream input stream.
+   * @param length number of bytes to read.
+   * @param buffer buffer to fill.
+   * @throws IOException any IOE.
+   */
+  private static void readInDirectBuffer(PositionedReadable stream,
+                                         int length,
+                                         ByteBuffer buffer) throws IOException 
{
+    int readBytes = 0;
+    int offset = 0;
+    byte[] tmp = new byte[TMP_BUFFER_MAX_SIZE];

Review Comment:
   make a min of length, TMP_BUFFER_MAX_SIZE for more efficiency on small reads





> Improve VectoredReadUtils
> -------------------------
>
>                 Key: HADOOP-18391
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18391
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 3.3.9
>            Reporter: Steve Loughran
>            Assignee: Mukund Thakur
>            Priority: Major
>              Labels: pull-request-available
>
> harden the VectoredReadUtils methods for consistent and more robust use, 
> especially in those filesystems which don't have the api.
> VectoredReadUtils.readInDirectBuffer should allocate a max buffer size, .e.g 
> 4mb, then do repeated reads and copies; this ensures that you don't OOM with 
> many threads doing ranged requests. other libs do this.
> readVectored to call validateNonOverlappingAndReturnSortedRanges before 
> iterating
> this ensures the abfs/s3a requirements are always met, and that because 
> ranges will be read in order, prefetching by other clients will keep their 
> performance good.
> readVectored to add special handling for 0 byte ranges



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18391) Improve VectoredReadUtils

Reply via email to