[
https://issues.apache.org/jira/browse/HADOOP-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751714#comment-17751714
]
Maxim Martynov commented on HADOOP-18838:
-----------------------------------------
> whatever is in core default takes priority, so don't worry about the code
Then users try to investigate issues with connection settings or performance,
they google Hadoop AWS main or performance page, and search for their case. I
don't see any link to `core-default.xml` on these pages, as well as in the
source code of this library. This is just some fact that is known by developers
of Hadoop, but new fo newbies. This is bad developer experience.
> Some fs.s3a.* config values are different in sources and documentation
> ----------------------------------------------------------------------
>
> Key: HADOOP-18838
> URL: https://issues.apache.org/jira/browse/HADOOP-18838
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.3.6
> Reporter: Maxim Martynov
> Priority: Major
>
> For config option {{fs.s3a.retry.throttle.interval}} default value in source
> code is {{500ms}}:
> {code:java}
> public static final String RETRY_THROTTLE_INTERVAL_DEFAULT = "500ms";
> {code}
> https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L921
> In {{core-default.xml}} it has value {{100ms}}, but in the description
> {{500ms}}:
> {code:xml}
> <property>
> <name>fs.s3a.retry.throttle.interval</name>
> <value>100ms</value>
> <description>
> Initial between retry attempts on throttled requests, +/- 50%. chosen at
> random.
> i.e. for an intial value of 3000ms, the initial delay would be in the
> range 1500ms to 4500ms.
> Backoffs are exponential; again randomness is used to avoid the
> thundering heard problem.
> 500ms is the default value used by the AWS S3 Retry policy.
> </description>
> </property>
> {code}
> https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L1750
> This change introduced in HADOOP-16823.
> In Hadoop-AWS module documentation it has value {{1000ms}}:
> {code:xml}
> <property>
> <name>fs.s3a.retry.throttle.interval</name>
> <value>1000ms</value>
> <description>
> Interval between retry attempts on throttled requests.
> </description>
> </property>
> {code}
> https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md?plain=1#L1223
> File was created in HADOOP-13786, and value is left unchanged since when.
> In performance tuning page it has up-to-date value {{500ms}}:
> {code:xml}
> <property>
> <name>fs.s3a.retry.throttle.interval</name>
> <value>500ms</value>
> <description>
> Interval between retry attempts on throttled requests.
> </description>
> </property>
> {code}
> https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md?plain=1#L435
> This change introduced in HADOOP-15076.
> The same issue with:
> * {{fs.s3a.retry.throttle.limit}} - in source code it has value {{20}}, but
> in some documents still old value ${fs.s3a.attempts.maximum}
> * {{fs.s3a.connection.establish.timeout}} - in source code it has value
> {{50_000}}, in config file & documentation {{5_000}}
> * {{fs.s3a.attempts.maximum}} - in source code it has value {{10}}, in config
> file & documentation {{20}}
> * {{fs.s3a.threads.max}} - in source & documentation code it has value
> {{10}}, in config file {{64}}
> * {{fs.s3a.max.total.tasks}} - in source code & config it has value {{32}},
> in documentation {{5}}
> * {{fs.s3a.connection.maximum}} - in source code & config it has value
> {{96}}, in documentation {{15}} or {{30}}
> Please sync these values, outdated documentation is very painful to work with.
> As an idea, is it possible to use {{core-default.xml}} directly in
> documentation, or generate this documentation from docstrings in Java code?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]