[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

ASF subversion and git services (Jira) Thu, 01 Oct 2020 17:13:45 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205897#comment-17205897
 ]


ASF subversion and git services commented on IMPALA-8525:
---------------------------------------------------------

Commit 8e9cf51f6b328f500acf7c577289c5b888fd15d2 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8e9cf51 ]

IMPALA-9606: ABFS reads should use hdfsPreadFully

Similar to IMPALA-8525, but for ABFS, instead of S3A.
I don't expect this to make a major improvement in performance,
like it did for S3A, although I am still seeing a marginal
improvement during some ad-hoc testing (about 5% scan perf
improvement). The reason is that the implementation of the ABFS
and S3A clients are very different, ABFS already reads all data
requested in a single hdfsRead call.

I ran the query 'select * from abfs_test_store_sales order by
ss_net_profit limit 10;' several times to validate that perf
does not regress. In fact, it does improve slightly for this query.
The table 'abfs_test_store_sales' is just a copy of the mini-cluster's
tpcds_parquet.store_sales, although it is not partitioned.

Testing:
* Tested against a ABFS storage account I have access to
* Ran several queries to validate there are no functional
  or perf regressions.

Change-Id: I994ea30cf31abc66f5d82d9b3c8e185d2bd06147
Reviewed-on: http://gerrit.cloudera.org:8080/16531
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> preads should use hdfsPreadFully rather than hdfsPread
> ------------------------------------------------------
>
>                 Key: IMPALA-8525
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8525
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
> {{hdfsPread}} API from libhdfs, which ultimately invokes 
> {{PositionedReadable#read(long position, byte[] buffer, int offset, int 
> length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, 
> byte[] buffer, int offset, int length)}}. The difference is that {{#read}} 
> will "Read up to the specified number of bytes" whereas {{#readFully}} will 
> "Read the specified number of bytes". So there is no guarantee that {{#read}} 
> will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it 
> inside a while loop until all the requested bytes have been read from the 
> file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
> (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
> allocate a Java array equal in size to specified length of the buffer; the 
> call to {{PositionedReadable#read}} may only fill up the buffer partially; 
> Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, 
> which will cause another large array allocation; this can result in a lot of 
> wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point 
> in continuously calling {{hdfsPread}} when a single call to 
> {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect 
> performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to 
> Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related 
> changes for S3). However, with the migration to {{hdfsPreadFully}} the 
> chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once 
> (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller 
> chunks (typically 128K). For example, {{DFSInputStream#read(long position, 
> byte[] buffer, int offset, int length)}} opens up remote block readers with a 
> byte range determined by the value of {{length}} passed into the {{#read}} 
> call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request 
> with the size of the read specified by the given {{length}} (although fadvise 
> must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

Reply via email to