Sahil Takiar created IMPALA-8525:
------------------------------------
Summary: preads should use hdfsPreadFully rather than hdfsPread
Key: IMPALA-8525
URL: https://issues.apache.org/jira/browse/IMPALA-8525
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar
Impala preads (only enabled if {{use_hdfs_pread}} is true) use the
{{hdfsPread}} API from libhdfs, which ultimately invokes
{{PositionedReadable#read(long position, byte[] buffer, int offset, int
length)}} in the HDFS-client.
{{PositionedReadable}} also exposes the method {{readFully(long position,
byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will
"Read up to the specified number of bytes" whereas {{#readFully}} will "Read
the specified number of bytes". So there is no guarantee that {{#read}} will
read *all* of the request bytes.
Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside
a while loop until all the requested bytes have been read from the file. This
can cause a few performance issues:
(1) if the underlying {{FileSystem}} does not support ByteBuffer reads
(HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will
allocate a Java array equal in size to specified length of the buffer; the call
to {{PositionedReadable#read}} may only fill up the buffer partially; Impala
will repeat the call to {{hdfsPread}} since the buffer was not filled, which
will cause another large array allocation; this can result in a lot of wasted
time doing unnecessary array allocations
(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in
continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}}
will achieve the same thing (this doesn't actually affect performance much, but
is unnecessary)
Prior solutions to this problem have been to introduce a "chunk-size" to Impala
reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for
S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no
longer necessary.
Furthermore, preads are most effective when the data is read all at once (e.g.
in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks
(typically 128K). For example, {{DFSInputStream#read(long position, byte[]
buffer, int offset, int length)}} opens up remote block readers with a byte
range determined by the value of {{length}} passed into the {{#read}} call.
Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the
size of the read specified by the given {{length}} (although fadvise must be
set to RANDOM for this to work).
This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)