[jira] [Commented] (HADOOP-17789) S3 read performance with Spark with Hadoop 3.3.1 is slower than older Hadoop

Steve Loughran (Jira) Mon, 19 Jul 2021 13:29:22 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383559#comment-17383559
 ]


Steve Loughran commented on HADOOP-17789:
-----------------------------------------

aash, CSVs

oh. for CSV you must set 
spark.hadoop.fs.s3a.experimental.input.fadvise=sequential, so a single GET can 
do a full read. 

you are being hurt by the fact that on random IO the stream fetches max 
(ffs.s3a.readahead.range, byte-range in read() call). Efficient for random IO, 
awful for single file read.

say you have a 10MB file and 512 KB readahead, that means 20 GET calls, with 
the latency of issuing the GET (hopefully on same stream) and then latencies of 
response,  TCP-flow control etc. Setting a large readahead will help CSV 
slightly, but then you hurt ORC IO as it will read to the end of each ranged 
GET when it has to seek backwards. (forward seeks within the same GB will be 
read-and-discard until you get to the right place. Still wasteful for seeks of 
more than a few hundred KB.


spark.hadoop.fs.s3a.experimental.input.fadvise default
spark.hadoop.fs.s3a.readahead.range  128k

This will default to a sequential read, but on the first backward seek switches 
to random IO.; smaller reads

Parquet and ORC IO both read the footers, so switch "default" reads from 
sequential to random. There's possibly a hit on that first read being aborted, 
but after that all is good.

HADOOP-16109 is unrelated. it's just a fix for an off-by-one bug.    

Other options I suggest simply expand sizes of pools of threads, 
httpconnections block size on writes

fs.s3a.block.size =64M
fs.s3a.max.total.tasks = 320
fs.s3a.threads.max = 256
fs.s3a.connection.maximum = 300 

I see fs.s3a.committer.name = "file" [core-default.xml] ; hopefully you 
overrride that in spark


the optional class org.wildfly.openssl.OpenSSLProvider is not on the classpath.
have a look for wildfly.jar ...if you can get that on the classpath, because 
your ubuntu box has openssl installed -you should get better SSL performance 
(==more bandwidth from more efficient SSL decryption)

Also, if you are hadoop 3.3.1+  only, try:


fs.s3a.directory.marker.retention = keep

This stops us deleting dir markers on mkdir, PUT etc and so saves write IOPS. 
Not compatible with older releases, which see a dir/ marker and conclude that 
dirs are empty underneath.

(note: HADOOP-16202 adds a standard way for parquet, orc etc to ask for random 
IO, always; I've a fork of parquet which does this)

  

> S3 read performance with Spark with Hadoop 3.3.1 is slower than older Hadoop
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-17789
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17789
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 3.3.1
>            Reporter: Arghya Saha
>            Priority: Major
>         Attachments: storediag.log
>
>
> This is issue is continuation to 
> https://issues.apache.org/jira/browse/HADOOP-17755
> The input data reported by Spark(Hadoop 3.3.1) was almost double and read 
> runtime also increased (around 20%) compared to Spark(Hadoop 3.2.0) with same 
> exact amount of resource and same configuration. And this is happening with 
> other jobs as well which was not impacted by read fully error as stated above.
> *I was having the same exact issue when I was using the workaround  
> fs.s3a.readahead.range = 1G with Hadoop 3.2.0*
> Below is further details :
>  
> |Hadoop Version|Actual size of the files(in SQL Tab)|Reported size of the 
> file(In Stages)|Time to complete the Stage|fs.s3a.readahead.range|
> |Hadoop 3.2.0|29.3 GiB|29.3 GiB|23 min|64K|
> |Hadoop 3.3.1|29.3 GiB|*{color:#ff0000}58.7 GiB{color}*|*{color:#ff0000}27 
> min{color}*|{color:#172b4d}64K{color}|
> |Hadoop 3.2.0|29.3 GiB|*{color:#ff0000}58.7 GiB{color}*|*{color:#ff0000}~27 
> min{color}*|{color:#172b4d}1G{color}|
>  * *Shuffle Write* is same (95.9 GiB) for all the above three cases
> I was expecting some improvement(or same as 3.2.0) with Hadoop 3.3.1 with 
> read operations, please suggest how to approach this and resolve this.
> I have used the default s3a config along with below and also using EKS cluster
> {code:java}
> spark.hadoop.fs.s3a.committer.magic.enabled: 'true'
> spark.hadoop.fs.s3a.committer.name: magic
> spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a: 
> org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
> spark.hadoop.fs.s3a.downgrade.syncable.exceptions: "true"{code}
>  * I did not use 
> {code:java}
> spark.hadoop.fs.s3a.experimental.input.fadvise=random{code}
> And as already mentioned I have used same Spark, same amount of resources and 
> same config.  Only change is Hadoop 3.2.0 to Hadoop 3.3.1 (Built with Spark 
> using ./dev/make-distribution.sh --name spark-patched --pip -Pkubernetes 
> -Phive -Phive-thriftserver -Dhadoop.version="3.3.1")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-17789) S3 read performance with Spark with Hadoop 3.3.1 is slower than older Hadoop

Reply via email to