[
https://issues.apache.org/jira/browse/HADOOP-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828863#comment-16828863
]
Sahil Takiar commented on HADOOP-16241:
---------------------------------------
Attached a flamegraph of an Impala full table scan of {{web_returns}} - I'm
planning to address some of the SSL decrypt overhead in HADOOP-16050. The other
main bottleneck is from all the lazy seeks which are ultimately dominated by
SSL handshakes.
> S3AInputStream PositionReadable should perform ranged read on dedicated
> stream
> -------------------------------------------------------------------------------
>
> Key: HADOOP-16241
> URL: https://issues.apache.org/jira/browse/HADOOP-16241
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Attachments: Impala-TPCDS-scans.zip,
> impala-web_returns-scan-flamegraph.svg
>
>
> The current implementation of {{PositionReadable}} in {{S3AInputStream}} is
> pretty close to the default implementation in {{FsInputStream}}.
> This JIRA proposes overriding the {{read(long position, byte[] buffer, int
> offset, int length)}} method and re-implementing the {{readFully(long
> position, byte[] buffer, int offset, int length)}} method in S3A.
> The new implementation would perform a "ranged read" on a dedicated object
> stream (rather than the shared one). Prototypes have shown this to bring a
> considerable performance improvement to readers who are only interested in
> reading a random chunk of the file at a time (e.g. Impala, although I would
> assume HBase would benefit from this as well).
> Setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} is helpful for
> clients that rely on pread, but has a few drawbacks:
> * Unless the client explicitly sets fadvise to RANDOM, they will get at
> least one connection reset when the backwards seek is issued (after which
> fadvise automatically switches to RANDOM)
> * Data is only read in 64 kb chunks, so for a large read, several GET
> requests must be issued to S3 to fetch the data; while the 64 kb chunk value
> is configurable, it is hard to set a reasonable value for variable length
> preads
> * If the readahead value is too big, closing the input stream can take
> considerable time because the stream has to be drained of data before it can
> be closed
> The new implementation of {{PositionReadable}} would issue a
> {{GetObjectRequest}} with the range specified by {{position}} and the size of
> the given buffer. The data would be read from the {{S3ObjectInputStream}} and
> then closed at the end of the method. This stream would be independent of the
> {{wrappedStream}} currently maintained by S3A.
> This brings the following benefits:
> * The {{PositionedReadable}} methods can be thread-safe without a
> {{synchronized}} block, which allows clients to concurrently call pread
> methods on the same {{S3AInputStream}} instance
> * preads will request all the data at once rather than requesting it in
> chunks via the readahead logic
> * Avoids performing potentially expensive seeks when performing preads
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]