[ https://issues.apache.org/jira/browse/HADOOP-17789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383559#comment-17383559 ]
Steve Loughran commented on HADOOP-17789: ----------------------------------------- aash, CSVs oh. for CSV you must set spark.hadoop.fs.s3a.experimental.input.fadvise=sequential, so a single GET can do a full read. you are being hurt by the fact that on random IO the stream fetches max (ffs.s3a.readahead.range, byte-range in read() call). Efficient for random IO, awful for single file read. say you have a 10MB file and 512 KB readahead, that means 20 GET calls, with the latency of issuing the GET (hopefully on same stream) and then latencies of response, TCP-flow control etc. Setting a large readahead will help CSV slightly, but then you hurt ORC IO as it will read to the end of each ranged GET when it has to seek backwards. (forward seeks within the same GB will be read-and-discard until you get to the right place. Still wasteful for seeks of more than a few hundred KB. spark.hadoop.fs.s3a.experimental.input.fadvise default spark.hadoop.fs.s3a.readahead.range 128k This will default to a sequential read, but on the first backward seek switches to random IO.; smaller reads Parquet and ORC IO both read the footers, so switch "default" reads from sequential to random. There's possibly a hit on that first read being aborted, but after that all is good. HADOOP-16109 is unrelated. it's just a fix for an off-by-one bug. Other options I suggest simply expand sizes of pools of threads, httpconnections block size on writes fs.s3a.block.size =64M fs.s3a.max.total.tasks = 320 fs.s3a.threads.max = 256 fs.s3a.connection.maximum = 300 I see fs.s3a.committer.name = "file" [core-default.xml] ; hopefully you overrride that in spark the optional class org.wildfly.openssl.OpenSSLProvider is not on the classpath. have a look for wildfly.jar ...if you can get that on the classpath, because your ubuntu box has openssl installed -you should get better SSL performance (==more bandwidth from more efficient SSL decryption) Also, if you are hadoop 3.3.1+ only, try: fs.s3a.directory.marker.retention = keep This stops us deleting dir markers on mkdir, PUT etc and so saves write IOPS. Not compatible with older releases, which see a dir/ marker and conclude that dirs are empty underneath. (note: HADOOP-16202 adds a standard way for parquet, orc etc to ask for random IO, always; I've a fork of parquet which does this) > S3 read performance with Spark with Hadoop 3.3.1 is slower than older Hadoop > ---------------------------------------------------------------------------- > > Key: HADOOP-17789 > URL: https://issues.apache.org/jira/browse/HADOOP-17789 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 3.3.1 > Reporter: Arghya Saha > Priority: Major > Attachments: storediag.log > > > This is issue is continuation to > https://issues.apache.org/jira/browse/HADOOP-17755 > The input data reported by Spark(Hadoop 3.3.1) was almost double and read > runtime also increased (around 20%) compared to Spark(Hadoop 3.2.0) with same > exact amount of resource and same configuration. And this is happening with > other jobs as well which was not impacted by read fully error as stated above. > *I was having the same exact issue when I was using the workaround > fs.s3a.readahead.range = 1G with Hadoop 3.2.0* > Below is further details : > > |Hadoop Version|Actual size of the files(in SQL Tab)|Reported size of the > file(In Stages)|Time to complete the Stage|fs.s3a.readahead.range| > |Hadoop 3.2.0|29.3 GiB|29.3 GiB|23 min|64K| > |Hadoop 3.3.1|29.3 GiB|*{color:#ff0000}58.7 GiB{color}*|*{color:#ff0000}27 > min{color}*|{color:#172b4d}64K{color}| > |Hadoop 3.2.0|29.3 GiB|*{color:#ff0000}58.7 GiB{color}*|*{color:#ff0000}~27 > min{color}*|{color:#172b4d}1G{color}| > * *Shuffle Write* is same (95.9 GiB) for all the above three cases > I was expecting some improvement(or same as 3.2.0) with Hadoop 3.3.1 with > read operations, please suggest how to approach this and resolve this. > I have used the default s3a config along with below and also using EKS cluster > {code:java} > spark.hadoop.fs.s3a.committer.magic.enabled: 'true' > spark.hadoop.fs.s3a.committer.name: magic > spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a: > org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory > spark.hadoop.fs.s3a.downgrade.syncable.exceptions: "true"{code} > * I did not use > {code:java} > spark.hadoop.fs.s3a.experimental.input.fadvise=random{code} > And as already mentioned I have used same Spark, same amount of resources and > same config. Only change is Hadoop 3.2.0 to Hadoop 3.3.1 (Built with Spark > using ./dev/make-distribution.sh --name spark-patched --pip -Pkubernetes > -Phive -Phive-thriftserver -Dhadoop.version="3.3.1") -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org