Sahil Takiar created IMPALA-8525:
------------------------------------

             Summary: preads should use hdfsPreadFully rather than hdfsPread
                 Key: IMPALA-8525
                 URL: https://issues.apache.org/jira/browse/IMPALA-8525
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
{{hdfsPread}} API from libhdfs, which ultimately invokes 
{{PositionedReadable#read(long position, byte[] buffer, int offset, int 
length)}} in the HDFS-client.

{{PositionedReadable}} also exposes the method {{readFully(long position, 
byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will 
"Read up to the specified number of bytes" whereas {{#readFully}} will "Read 
the specified number of bytes". So there is no guarantee that {{#read}} will 
read *all* of the request bytes.

Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside 
a while loop until all the requested bytes have been read from the file. This 
can cause a few performance issues:

(1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
(HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
allocate a Java array equal in size to specified length of the buffer; the call 
to {{PositionedReadable#read}} may only fill up the buffer partially; Impala 
will repeat the call to {{hdfsPread}} since the buffer was not filled, which 
will cause another large array allocation; this can result in a lot of wasted 
time doing unnecessary array allocations

(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in 
continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} 
will achieve the same thing (this doesn't actually affect performance much, but 
is unnecessary)

Prior solutions to this problem have been to introduce a "chunk-size" to Impala 
reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for 
S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no 
longer necessary.

Furthermore, preads are most effective when the data is read all at once (e.g. 
in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks 
(typically 128K). For example, {{DFSInputStream#read(long position, byte[] 
buffer, int offset, int length)}} opens up remote block readers with a byte 
range determined by the value of {{length}} passed into the {{#read}} call. 
Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the 
size of the read specified by the given {{length}} (although fadvise must be 
set to RANDOM for this to work).

This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to