MAPREDUCE-6260. Convert site documentation to markdown (Masatake Iwasaki via aw)
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/8b787e2f Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/8b787e2f Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/8b787e2f Branch: refs/heads/trunk Commit: 8b787e2fdbd0050c0345cf14b26af9d61049068f Parents: 34b78d5 Author: Allen Wittenauer <a...@apache.org> Authored: Tue Feb 17 06:52:14 2015 -1000 Committer: Allen Wittenauer <a...@apache.org> Committed: Tue Feb 17 06:52:14 2015 -1000 ---------------------------------------------------------------------- hadoop-mapreduce-project/CHANGES.txt | 3 + .../src/site/apt/DistributedCacheDeploy.apt.vm | 151 - .../src/site/apt/EncryptedShuffle.apt.vm | 320 --- .../src/site/apt/MapReduceTutorial.apt.vm | 1605 ----------- ...pReduce_Compatibility_Hadoop1_Hadoop2.apt.vm | 114 - .../src/site/apt/MapredAppMasterRest.apt.vm | 2709 ------------------ .../src/site/apt/MapredCommands.apt.vm | 233 -- .../apt/PluggableShuffleAndPluggableSort.apt.vm | 98 - .../site/markdown/DistributedCacheDeploy.md.vm | 119 + .../src/site/markdown/EncryptedShuffle.md | 255 ++ .../src/site/markdown/MapReduceTutorial.md | 1156 ++++++++ .../MapReduce_Compatibility_Hadoop1_Hadoop2.md | 69 + .../src/site/markdown/MapredAppMasterRest.md | 2397 ++++++++++++++++ .../src/site/markdown/MapredCommands.md | 153 + .../PluggableShuffleAndPluggableSort.md | 73 + .../src/site/apt/HistoryServerRest.apt.vm | 2672 ----------------- .../src/site/markdown/HistoryServerRest.md | 2361 +++++++++++++++ 17 files changed, 6586 insertions(+), 7902 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/8b787e2f/hadoop-mapreduce-project/CHANGES.txt ---------------------------------------------------------------------- diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt index 9ef7a32..aebc71e 100644 --- a/hadoop-mapreduce-project/CHANGES.txt +++ b/hadoop-mapreduce-project/CHANGES.txt @@ -96,6 +96,9 @@ Trunk (Unreleased) MAPREDUCE-6250. deprecate sbin/mr-jobhistory-daemon.sh (aw) + MAPREDUCE-6260. Convert site documentation to markdown (Masatake Iwasaki + via aw) + BUG FIXES MAPREDUCE-6191. Improve clearing stale state of Java serialization http://git-wip-us.apache.org/repos/asf/hadoop/blob/8b787e2f/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm deleted file mode 100644 index 2195e10..0000000 --- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm +++ /dev/null @@ -1,151 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop Map Reduce Next Generation-${project.version} - Distributed Cache Deploy - --- - --- - ${maven.build.timestamp} - -Hadoop MapReduce Next Generation - Distributed Cache Deploy - -* Introduction - - The MapReduce application framework has rudimentary support for deploying a - new version of the MapReduce framework via the distributed cache. By setting - the appropriate configuration properties, users can run a different version - of MapReduce than the one initially deployed to the cluster. For example, - cluster administrators can place multiple versions of MapReduce in HDFS and - configure <<<mapred-site.xml>>> to specify which version jobs will use by - default. This allows the administrators to perform a rolling upgrade of the - MapReduce framework under certain conditions. - -* Preconditions and Limitations - - The support for deploying the MapReduce framework via the distributed cache - currently does not address the job client code used to submit and query - jobs. It also does not address the <<<ShuffleHandler>>> code that runs as an - auxilliary service within each NodeManager. As a result the following - limitations apply to MapReduce versions that can be successfully deployed via - the distributed cache in a rolling upgrade fashion: - - * The MapReduce version must be compatible with the job client code used to - submit and query jobs. If it is incompatible then the job client must be - upgraded separately on any node from which jobs using the new MapReduce - version will be submitted or queried. - - * The MapReduce version must be compatible with the configuration files used - by the job client submitting the jobs. If it is incompatible with that - configuration (e.g.: a new property must be set or an existing property - value changed) then the configuration must be updated first. - - * The MapReduce version must be compatible with the <<<ShuffleHandler>>> - version running on the nodes in the cluster. If it is incompatible then the - new <<<ShuffleHandler>>> code must be deployed to all the nodes in the - cluster, and the NodeManagers must be restarted to pick up the new - <<<ShuffleHandler>>> code. - -* Deploying a New MapReduce Version via the Distributed Cache - - Deploying a new MapReduce version consists of three steps: - - [[1]] Upload the MapReduce archive to a location that can be accessed by the - job submission client. Ideally the archive should be on the cluster's default - filesystem at a publicly-readable path. See the archive location discussion - below for more details. - - [[2]] Configure <<<mapreduce.application.framework.path>>> to point to the - location where the archive is located. As when specifying distributed cache - files for a job, this is a URL that also supports creating an alias for the - archive if a URL fragment is specified. For example, - <<<hdfs:/mapred/framework/hadoop-mapreduce-${project.version}.tar.gz#mrframework>>> - will be localized as <<<mrframework>>> rather than - <<<hadoop-mapreduce-${project.version}.tar.gz>>>. - - [[3]] Configure <<<mapreduce.application.classpath>>> to set the proper - classpath to use with the MapReduce archive configured above. NOTE: An error - occurs if <<<mapreduce.application.framework.path>>> is configured but - <<<mapreduce.application.classpath>>> does not reference the base name of the - archive path or the alias if an alias was specified. - -** Location of the MapReduce Archive and How It Affects Job Performance - - Note that the location of the MapReduce archive can be critical to job - submission and job startup performance. If the archive is not located on the - cluster's default filesystem then it will be copied to the job staging - directory for each job and localized to each node where the job's tasks - run. This will slow down job submission and task startup performance. - - If the archive is located on the default filesystem then the job client will - not upload the archive to the job staging directory for each job - submission. However if the archive path is not readable by all cluster users - then the archive will be localized separately for each user on each node - where tasks execute. This can cause unnecessary duplication in the - distributed cache. - - When working with a large cluster it can be important to increase the - replication factor of the archive to increase its availability. This will - spread the load when the nodes in the cluster localize the archive for the - first time. - -* MapReduce Archives and Classpath Configuration - - Setting a proper classpath for the MapReduce archive depends upon the - composition of the archive and whether it has any additional dependencies. - For example, the archive can contain not only the MapReduce jars but also the - necessary YARN, HDFS, and Hadoop Common jars and all other dependencies. In - that case, <<<mapreduce.application.classpath>>> would be configured to - something like the following example, where the archive basename is - hadoop-mapreduce-${project.version}.tar.gz and the archive is organized - internally similar to the standard Hadoop distribution archive: - - <<<$HADOOP_CONF_DIR,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/lib/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/common/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/common/lib/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/yarn/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/yarn/lib/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/hdfs/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/hdfs/lib/*>>> - - Another possible approach is to have the archive consist of just the - MapReduce jars and have the remaining dependencies picked up from the Hadoop - distribution installed on the nodes. In that case, the above example would - change to something like the following: - - <<<$HADOOP_CONF_DIR,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/*,$PWD/hadoop-mapreduce-${project.version}.tar.gz/hadoop-mapreduce-${project.version}/share/hadoop/mapreduce/lib/*,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*>>> - -** NOTE: - - If shuffle encryption is also enabled in the cluster, then we could meet the problem that MR job get failed with exception like below: - -+---+ -2014-10-10 02:17:16,600 WARN [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to junpingdu-centos5-3.cs1cloud.internal:13562 with 1 map outputs -javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target - at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174) - at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731) - at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241) - at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235) - at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206) - at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136) - at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593) - at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529) - at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925) - at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170) - at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197) - at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181) - at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434) - at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81) - at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61) - at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584) - at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193) - at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379) - at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318) - at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427) -.... - -+---+ - - This is because MR client (deployed from HDFS) cannot access ssl-client.xml in local FS under directory of $HADOOP_CONF_DIR. To fix the problem, we can add the directory with ssl-client.xml to the classpath of MR which is specified in "mapreduce.application.classpath" as mentioned above. To avoid MR application being affected by other local configurations, it is better to create a dedicated directory for putting ssl-client.xml, e.g. a sub-directory under $HADOOP_CONF_DIR, like: $HADOOP_CONF_DIR/security. http://git-wip-us.apache.org/repos/asf/hadoop/blob/8b787e2f/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm deleted file mode 100644 index 1761ad8..0000000 --- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm +++ /dev/null @@ -1,320 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop Map Reduce Next Generation-${project.version} - Encrypted Shuffle - --- - --- - ${maven.build.timestamp} - -Hadoop MapReduce Next Generation - Encrypted Shuffle - -* {Introduction} - - The Encrypted Shuffle capability allows encryption of the MapReduce shuffle - using HTTPS and with optional client authentication (also known as - bi-directional HTTPS, or HTTPS with client certificates). It comprises: - - * A Hadoop configuration setting for toggling the shuffle between HTTP and - HTTPS. - - * A Hadoop configuration settings for specifying the keystore and truststore - properties (location, type, passwords) used by the shuffle service and the - reducers tasks fetching shuffle data. - - * A way to re-load truststores across the cluster (when a node is added or - removed). - -* {Configuration} - -** <<core-site.xml>> Properties - - To enable encrypted shuffle, set the following properties in core-site.xml of - all nodes in the cluster: - -*--------------------------------------+---------------------+-----------------+ -| <<Property>> | <<Default Value>> | <<Explanation>> | -*--------------------------------------+---------------------+-----------------+ -| <<<hadoop.ssl.require.client.cert>>> | <<<false>>> | Whether client certificates are required | -*--------------------------------------+---------------------+-----------------+ -| <<<hadoop.ssl.hostname.verifier>>> | <<<DEFAULT>>> | The hostname verifier to provide for HttpsURLConnections. Valid values are: <<DEFAULT>>, <<STRICT>>, <<STRICT_I6>>, <<DEFAULT_AND_LOCALHOST>> and <<ALLOW_ALL>> | -*--------------------------------------+---------------------+-----------------+ -| <<<hadoop.ssl.keystores.factory.class>>> | <<<org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory>>> | The KeyStoresFactory implementation to use | -*--------------------------------------+---------------------+-----------------+ -| <<<hadoop.ssl.server.conf>>> | <<<ssl-server.xml>>> | Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory | -*--------------------------------------+---------------------+-----------------+ -| <<<hadoop.ssl.client.conf>>> | <<<ssl-client.xml>>> | Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory | -*--------------------------------------+---------------------+-----------------+ -| <<<hadoop.ssl.enabled.protocols>>> | <<<TLSv1>>> | The supported SSL protocols (JDK6 can use <<TLSv1>>, JDK7+ can use <<TLSv1,TLSv1.1,TLSv1.2>>) | -*--------------------------------------+---------------------+-----------------+ - - <<IMPORTANT:>> Currently requiring client certificates should be set to false. - Refer the {{{ClientCertificates}Client Certificates}} section for details. - - <<IMPORTANT:>> All these properties should be marked as final in the cluster - configuration files. - -*** Example: - ------- - ... - <property> - <name>hadoop.ssl.require.client.cert</name> - <value>false</value> - <final>true</final> - </property> - - <property> - <name>hadoop.ssl.hostname.verifier</name> - <value>DEFAULT</value> - <final>true</final> - </property> - - <property> - <name>hadoop.ssl.keystores.factory.class</name> - <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value> - <final>true</final> - </property> - - <property> - <name>hadoop.ssl.server.conf</name> - <value>ssl-server.xml</value> - <final>true</final> - </property> - - <property> - <name>hadoop.ssl.client.conf</name> - <value>ssl-client.xml</value> - <final>true</final> - </property> - ... ------- - -** <<<mapred-site.xml>>> Properties - - To enable encrypted shuffle, set the following property in mapred-site.xml - of all nodes in the cluster: - -*--------------------------------------+---------------------+-----------------+ -| <<Property>> | <<Default Value>> | <<Explanation>> | -*--------------------------------------+---------------------+-----------------+ -| <<<mapreduce.shuffle.ssl.enabled>>> | <<<false>>> | Whether encrypted shuffle is enabled | -*--------------------------------------+---------------------+-----------------+ - - <<IMPORTANT:>> This property should be marked as final in the cluster - configuration files. - -*** Example: - ------- - ... - <property> - <name>mapreduce.shuffle.ssl.enabled</name> - <value>true</value> - <final>true</final> - </property> - ... ------- - - The Linux container executor should be set to prevent job tasks from - reading the server keystore information and gaining access to the shuffle - server certificates. - - Refer to Hadoop Kerberos configuration for details on how to do this. - -* {Keystore and Truststore Settings} - - Currently <<<FileBasedKeyStoresFactory>>> is the only <<<KeyStoresFactory>>> - implementation. The <<<FileBasedKeyStoresFactory>>> implementation uses the - following properties, in the <<ssl-server.xml>> and <<ssl-client.xml>> files, - to configure the keystores and truststores. - -** <<<ssl-server.xml>>> (Shuffle server) Configuration: - - The mapred user should own the <<ssl-server.xml>> file and have exclusive - read access to it. - -*---------------------------------------------+---------------------+-----------------+ -| <<Property>> | <<Default Value>> | <<Explanation>> | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.keystore.type>>> | <<<jks>>> | Keystore file type | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.keystore.location>>> | NONE | Keystore file location. The mapred user should own this file and have exclusive read access to it. | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.keystore.password>>> | NONE | Keystore file password | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.truststore.type>>> | <<<jks>>> | Truststore file type | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.truststore.location>>> | NONE | Truststore file location. The mapred user should own this file and have exclusive read access to it. | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.truststore.password>>> | NONE | Truststore file password | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.server.truststore.reload.interval>>> | 10000 | Truststore reload interval, in milliseconds | -*--------------------------------------+----------------------------+-----------------+ - -*** Example: - ------- -<configuration> - - <!-- Server Certificate Store --> - <property> - <name>ssl.server.keystore.type</name> - <value>jks</value> - </property> - <property> - <name>ssl.server.keystore.location</name> - <value>${user.home}/keystores/server-keystore.jks</value> - </property> - <property> - <name>ssl.server.keystore.password</name> - <value>serverfoo</value> - </property> - - <!-- Server Trust Store --> - <property> - <name>ssl.server.truststore.type</name> - <value>jks</value> - </property> - <property> - <name>ssl.server.truststore.location</name> - <value>${user.home}/keystores/truststore.jks</value> - </property> - <property> - <name>ssl.server.truststore.password</name> - <value>clientserverbar</value> - </property> - <property> - <name>ssl.server.truststore.reload.interval</name> - <value>10000</value> - </property> -</configuration> ------- - -** <<<ssl-client.xml>>> (Reducer/Fetcher) Configuration: - - The mapred user should own the <<ssl-client.xml>> file and it should have - default permissions. - -*---------------------------------------------+---------------------+-----------------+ -| <<Property>> | <<Default Value>> | <<Explanation>> | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.keystore.type>>> | <<<jks>>> | Keystore file type | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.keystore.location>>> | NONE | Keystore file location. The mapred user should own this file and it should have default permissions. | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.keystore.password>>> | NONE | Keystore file password | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.truststore.type>>> | <<<jks>>> | Truststore file type | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.truststore.location>>> | NONE | Truststore file location. The mapred user should own this file and it should have default permissions. | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.truststore.password>>> | NONE | Truststore file password | -*---------------------------------------------+---------------------+-----------------+ -| <<<ssl.client.truststore.reload.interval>>> | 10000 | Truststore reload interval, in milliseconds | -*--------------------------------------+----------------------------+-----------------+ - -*** Example: - ------- -<configuration> - - <!-- Client certificate Store --> - <property> - <name>ssl.client.keystore.type</name> - <value>jks</value> - </property> - <property> - <name>ssl.client.keystore.location</name> - <value>${user.home}/keystores/client-keystore.jks</value> - </property> - <property> - <name>ssl.client.keystore.password</name> - <value>clientfoo</value> - </property> - - <!-- Client Trust Store --> - <property> - <name>ssl.client.truststore.type</name> - <value>jks</value> - </property> - <property> - <name>ssl.client.truststore.location</name> - <value>${user.home}/keystores/truststore.jks</value> - </property> - <property> - <name>ssl.client.truststore.password</name> - <value>clientserverbar</value> - </property> - <property> - <name>ssl.client.truststore.reload.interval</name> - <value>10000</value> - </property> -</configuration> ------- - -* Activating Encrypted Shuffle - - When you have made the above configuration changes, activate Encrypted - Shuffle by re-starting all NodeManagers. - - <<IMPORTANT:>> Using encrypted shuffle will incur in a significant - performance impact. Users should profile this and potentially reserve - 1 or more cores for encrypted shuffle. - -* {ClientCertificates} Client Certificates - - Using Client Certificates does not fully ensure that the client is a - reducer task for the job. Currently, Client Certificates (their private key) - keystore files must be readable by all users submitting jobs to the cluster. - This means that a rogue job could read such those keystore files and use - the client certificates in them to establish a secure connection with a - Shuffle server. However, unless the rogue job has a proper JobToken, it won't - be able to retrieve shuffle data from the Shuffle server. A job, using its - own JobToken, can only retrieve shuffle data that belongs to itself. - -* Reloading Truststores - - By default the truststores will reload their configuration every 10 seconds. - If a new truststore file is copied over the old one, it will be re-read, - and its certificates will replace the old ones. This mechanism is useful for - adding or removing nodes from the cluster, or for adding or removing trusted - clients. In these cases, the client or NodeManager certificate is added to - (or removed from) all the truststore files in the system, and the new - configuration will be picked up without you having to restart the NodeManager - daemons. - -* Debugging - - <<NOTE:>> Enable debugging only for troubleshooting, and then only for jobs - running on small amounts of data. It is very verbose and slows down jobs by - several orders of magnitude. (You might need to increase mapred.task.timeout - to prevent jobs from failing because tasks run so slowly.) - - To enable SSL debugging in the reducers, set <<<-Djavax.net.debug=all>>> in - the <<<mapreduce.reduce.child.java.opts>>> property; for example: - ------- - <property> - <name>mapred.reduce.child.java.opts</name> - <value>-Xmx-200m -Djavax.net.debug=all</value> - </property> ------- - - You can do this on a per-job basis, or by means of a cluster-wide setting in - the <<<mapred-site.xml>>> file. - - To set this property in NodeManager, set it in the <<<yarn-env.sh>>> file: - ------- - YARN_NODEMANAGER_OPTS="-Djavax.net.debug=all $YARN_NODEMANAGER_OPTS" -------