[ https://issues.apache.org/jira/browse/HADOOP-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323239#comment-16323239 ]
Abraham Fine edited comment on HADOOP-15076 at 1/11/18 11:49 PM: ----------------------------------------------------------------- I'm new to this codebase so I think I was able to point out a few parts of the documentation that may be confusing to new users. h3. performance.md * Would it be possible to change the introduction setting from two sequential lists to a table? That may make it easier to compare S3 and HDFS. * {{list files a lot. This includes the setup of all queries agains data:}} typo in agains * {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm not sure what this sentence is trying to express * {{Your problem may appear to be performance, but really it is that the commit protocol is both slow and unreliable}} Isn't the commit protocol being slow part of "performance"? Can this be rephrased? * {{This is leads to maximum read throughput}} "This will lead to..."? * Perhaps describe the {{random}} policy before {{normal}} as one needs to understand {{random}} before understanding {{normal}}. * {{may consume large amounts of resources if each query is working with a different set of s3 buckets}} Why wouldn't a large amount of resources be consumed if working with the same set of s3 buckets? * {{When uploading data, it is uploaded in blocks set by the option}} Consider changing to "Data is uploaded in blocks set by the option..." * Extra newline on 451 h3. troubleshooting_s3a.md * {{Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see.}} Again, I'm new here so I'm not sure about the history of this issue but this section seems a little heavy handed to me. Does amazon never release "bug fix" versions of their client that are API compatible? How can we make this statement with such certainty? was (Author: abrahamfine): I'm new to this codebase so I think I was able to point out a few parts of the documentation that may be confusing to new users. h3. performance.md * Would it be possible to change the introduction setting from two sequential lists to a table? That may make it easier to compare S3 and HDFS. * {{list files a lot. This includes the setup of all queries agains data:}} typo in agains * {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm not sure what this sentence is trying to express * {{Your problem may appear to be performance, but really it is that the commit protocol is both slow and unreliable}} Isn't the commit protocol being slow part of "performance"? Can this be rephrased? * {{This is leads to maximum read throughput}} "This will lead to..."? * Perhaps describe the {{random}} policy before {{normal}} as one needs to understand {{random}} before understanding {{normal}}. * {{may consume large amounts of resources if each query is working with a different set of s3 buckets}} Why wouldn't a large amount of resources be consumed if working with the same set of s3 buckets? * {{When uploading data, it is uploaded in blocks set by the option}} Consider changing to "Data is uploaded in blocks set by the option..." * Extra newline on 451 h3. troubleshooting_s3a.md * {{Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see.}} Again, I'm new here so I'm not sure about the history of this issue but this section seems a little heavy handed to me. Does amazon never release "bug fix" versions of their client that are API compatible? How can we make this statement with such certainty? * > Enhance s3a troubleshooting docs, add perf section > -------------------------------------------------- > > Key: HADOOP-15076 > URL: https://issues.apache.org/jira/browse/HADOOP-15076 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs/s3 > Affects Versions: 2.8.2 > Reporter: Steve Loughran > Assignee: Abraham Fine > Attachments: HADOOP-15076-001.patch, HADOOP-15076-002.patch, > HADOOP-15076-003.patch, HADOOP-15076-004.patch > > > A recurrent theme in s3a-related JIRAs, support calls etc is "tried upgrading > the AWS SDK JAR and then I got the error ...". We know here "don't do that", > but its not something immediately obvious to lots of downstream users who > want to be able to drop in the new JAR to fix things/add new features > We need to spell this out quite clearlyi "you cannot safely expect to do > this. If you want to upgrade the SDK, you will need to rebuild the whole of > hadoop-aws with the maven POM updated to the latest version, ideally > rerunning all the tests to make sure something hasn't broken. > Maybe near the top of the index.md file, along with "never share your AWS > credentials with anyone" -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org