[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

ASF subversion and git services (Jira) Tue, 19 Nov 2019 17:20:40 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977969#comment-16977969
 ]


ASF subversion and git services commented on IMPALA-8525:
---------------------------------------------------------

Commit 89b9c93c7ac5f3eb19977290ba5115547120a0a3 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=89b9c93 ]

IMPALA-8525: preads should use hdfsPreadFully rather than hdfsPread

Modifies HdfsFileReader so that it calls hdfsPreadFully instead of
hdfsPread. hdfsPreadFully is a new libhdfs API introduced by HDFS-14564
(Add libhdfs APIs for readFully; add readFully to
ByteBufferPositionedReadable). hdfsPreadFully improves performance of
preads, especially when reading data from S3. The major difference
between hdfsPread and hdfsPreadFully is that hdfsPreadFully is
guaranteed to read all the requested bytes, whereas hdfsPread is only
guaranteed to read up to the number of requested bytes.

hdfsPreadFully reduces the amount of JNI array allocations necessary
when reading data from S3. When any read method in libhdfs is called,
the method allocates an array whose size is equal to the amount of data
requested. The issue is that Java's InputStream#read only guarantees
that it will read up to the amount of data requested. This can lead to
issues where a libhdfs read request allocates a large Java array, even
though the read request only partially fills it up.
PositionedReadable#readFully on the other hand, guarantees that all
requested data will be read, thus preventing any unnecessary JNI array
allocations.

hdfsPreadFully improves the effectiveness of
fs.s3a.experimental.input.fadvise=RANDOM (HADOOP-13203). S3A recommends
setting fadvise=RANDOM when doing random reads, which is common in
Impala when reading Parquet or ORC files. fadvise=RANDOM causes the
HTTP GET request that reads the S3 data to simply request the data
bounded by the parameters of the current read request (e.g. for
'read(long position, ..., int length)' it requests 'length' bytes). The
chunk-size optimization in HdfsFileReader hurts performance when
fadvise=RANDOM because each HTTP GET request will only request
'chunk-size' amount of bytes at a time. Which is why this patch removes
the chunk-size optimization as well. hdfsPreadFully helps here because
all the data in the scan range will be requested by a single HTTP GET
request.

Since hdfsPreadFully improves S3 read performance, this patch enables
preads for S3A files by default. Even if fadvise=SEQUENTIAL,
hdfsPreadFully still improves performance since it avoids unnecessary
JNI allocation overhead.

The chunk-size optimization (added in
https://gerrit.cloudera.org/#/c/63/) is no longer necessary after this
patch. hdfsPreadFully prevents any unnecessary array allocations.
Furthermore, it is likely the chunk-size optimization was added due to
overhead fixed by HDFS-14285.

Fixes a bug in IMPALA-8884 where the
'impala-server.io-mgr.queue-$i.read-size' statistics were being updated
with the chunk-size passed to HdfsFileReader::ReadFromPosInternal, which
is not necessarily equivalent to the amount of data actually read.

Testing:
* Ran core tests
* Ran core tests on S3
* Ad-hoc functional and performance testing on ABFS; no perf regression
observed; planning to further investigate the interaction between
hdfsPreadFully + ABFS in a future JIRA

Change-Id: I29ea34897096bc790abdeb98073a47f1c4c10feb
Reviewed-on: http://gerrit.cloudera.org:8080/14635
Reviewed-by: Sahil Takiar <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> preads should use hdfsPreadFully rather than hdfsPread
> ------------------------------------------------------
>
>                 Key: IMPALA-8525
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8525
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
> {{hdfsPread}} API from libhdfs, which ultimately invokes 
> {{PositionedReadable#read(long position, byte[] buffer, int offset, int 
> length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, 
> byte[] buffer, int offset, int length)}}. The difference is that {{#read}} 
> will "Read up to the specified number of bytes" whereas {{#readFully}} will 
> "Read the specified number of bytes". So there is no guarantee that {{#read}} 
> will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it 
> inside a while loop until all the requested bytes have been read from the 
> file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
> (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
> allocate a Java array equal in size to specified length of the buffer; the 
> call to {{PositionedReadable#read}} may only fill up the buffer partially; 
> Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, 
> which will cause another large array allocation; this can result in a lot of 
> wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point 
> in continuously calling {{hdfsPread}} when a single call to 
> {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect 
> performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to 
> Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related 
> changes for S3). However, with the migration to {{hdfsPreadFully}} the 
> chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once 
> (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller 
> chunks (typically 128K). For example, {{DFSInputStream#read(long position, 
> byte[] buffer, int offset, int length)}} opens up remote block readers with a 
> byte range determined by the value of {{length}} passed into the {{#read}} 
> call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request 
> with the size of the read specified by the given {{length}} (although fadvise 
> must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

Reply via email to