http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm deleted file mode 100644 index 02ff28b..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm +++ /dev/null @@ -1,879 +0,0 @@ -~~ Licensed to the Apache Software Foundation (ASF) under one or more -~~ contributor license agreements. See the NOTICE file distributed with -~~ this work for additional information regarding copyright ownership. -~~ The ASF licenses this file to You under the Apache License, Version 2.0 -~~ (the "License"); you may not use this file except in compliance with -~~ the License. You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. - - --- - Metrics Guide - --- - --- - ${maven.build.timestamp} - -%{toc} - -Overview - - Metrics are statistical information exposed by Hadoop daemons, - used for monitoring, performance tuning and debug. - There are many metrics available by default - and they are very useful for troubleshooting. - This page shows the details of the available metrics. - - Each section describes each context into which metrics are grouped. - - The documentation of Metrics 2.0 framework is - {{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}. - -jvm context - -* JvmMetrics - - Each metrics record contains tags such as ProcessName, SessionID - and Hostname as additional information along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<MemNonHeapUsedM>>> | Current non-heap memory used in MB -*-------------------------------------+--------------------------------------+ -|<<<MemNonHeapCommittedM>>> | Current non-heap memory committed in MB -*-------------------------------------+--------------------------------------+ -|<<<MemNonHeapMaxM>>> | Max non-heap memory size in MB -*-------------------------------------+--------------------------------------+ -|<<<MemHeapUsedM>>> | Current heap memory used in MB -*-------------------------------------+--------------------------------------+ -|<<<MemHeapCommittedM>>> | Current heap memory committed in MB -*-------------------------------------+--------------------------------------+ -|<<<MemHeapMaxM>>> | Max heap memory size in MB -*-------------------------------------+--------------------------------------+ -|<<<MemMaxM>>> | Max memory size in MB -*-------------------------------------+--------------------------------------+ -|<<<ThreadsNew>>> | Current number of NEW threads -*-------------------------------------+--------------------------------------+ -|<<<ThreadsRunnable>>> | Current number of RUNNABLE threads -*-------------------------------------+--------------------------------------+ -|<<<ThreadsBlocked>>> | Current number of BLOCKED threads -*-------------------------------------+--------------------------------------+ -|<<<ThreadsWaiting>>> | Current number of WAITING threads -*-------------------------------------+--------------------------------------+ -|<<<ThreadsTimedWaiting>>> | Current number of TIMED_WAITING threads -*-------------------------------------+--------------------------------------+ -|<<<ThreadsTerminated>>> | Current number of TERMINATED threads -*-------------------------------------+--------------------------------------+ -|<<<GcInfo>>> | Total GC count and GC time in msec, grouped by the kind of GC. \ - | ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40, - | GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0 -*-------------------------------------+--------------------------------------+ -|<<<GcCount>>> | Total GC count -*-------------------------------------+--------------------------------------+ -|<<<GcTimeMillis>>> | Total GC time in msec -*-------------------------------------+--------------------------------------+ -|<<<LogFatal>>> | Total number of FATAL logs -*-------------------------------------+--------------------------------------+ -|<<<LogError>>> | Total number of ERROR logs -*-------------------------------------+--------------------------------------+ -|<<<LogWarn>>> | Total number of WARN logs -*-------------------------------------+--------------------------------------+ -|<<<LogInfo>>> | Total number of INFO logs -*-------------------------------------+--------------------------------------+ -|<<<GcNumWarnThresholdExceeded>>> | Number of times that the GC warn - | threshold is exceeded -*-------------------------------------+--------------------------------------+ -|<<<GcNumInfoThresholdExceeded>>> | Number of times that the GC info - | threshold is exceeded -*-------------------------------------+--------------------------------------+ -|<<<GcTotalExtraSleepTime>>> | Total GC extra sleep time in msec -*-------------------------------------+--------------------------------------+ - -rpc context - -* rpc - - Each metrics record contains tags such as Hostname - and port (number to which server is bound) - as additional information along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<ReceivedBytes>>> | Total number of received bytes -*-------------------------------------+--------------------------------------+ -|<<<SentBytes>>> | Total number of sent bytes -*-------------------------------------+--------------------------------------+ -|<<<RpcQueueTimeNumOps>>> | Total number of RPC calls -*-------------------------------------+--------------------------------------+ -|<<<RpcQueueTimeAvgTime>>> | Average queue time in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<RpcProcessingTimeNumOps>>> | Total number of RPC calls (same to - | RpcQueueTimeNumOps) -*-------------------------------------+--------------------------------------+ -|<<<RpcProcessingAvgTime>>> | Average Processing time in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<RpcAuthenticationFailures>>> | Total number of authentication failures -*-------------------------------------+--------------------------------------+ -|<<<RpcAuthenticationSuccesses>>> | Total number of authentication successes -*-------------------------------------+--------------------------------------+ -|<<<RpcAuthorizationFailures>>> | Total number of authorization failures -*-------------------------------------+--------------------------------------+ -|<<<RpcAuthorizationSuccesses>>> | Total number of authorization successes -*-------------------------------------+--------------------------------------+ -|<<<NumOpenConnections>>> | Current number of open connections -*-------------------------------------+--------------------------------------+ -|<<<CallQueueLength>>> | Current length of the call queue -*-------------------------------------+--------------------------------------+ -|<<<rpcQueueTime>>><num><<<sNumOps>>> | Shows total number of RPC calls -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcQueueTime>>><num><<<s50thPercentileLatency>>> | -| | Shows the 50th percentile of RPC queue time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcQueueTime>>><num><<<s75thPercentileLatency>>> | -| | Shows the 75th percentile of RPC queue time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcQueueTime>>><num><<<s90thPercentileLatency>>> | -| | Shows the 90th percentile of RPC queue time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcQueueTime>>><num><<<s95thPercentileLatency>>> | -| | Shows the 95th percentile of RPC queue time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcQueueTime>>><num><<<s99thPercentileLatency>>> | -| | Shows the 99th percentile of RPC queue time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcProcessingTime>>><num><<<sNumOps>>> | Shows total number of RPC calls -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcProcessingTime>>><num><<<s50thPercentileLatency>>> | -| | Shows the 50th percentile of RPC processing time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcProcessingTime>>><num><<<s75thPercentileLatency>>> | -| | Shows the 75th percentile of RPC processing time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcProcessingTime>>><num><<<s90thPercentileLatency>>> | -| | Shows the 90th percentile of RPC processing time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcProcessingTime>>><num><<<s95thPercentileLatency>>> | -| | Shows the 95th percentile of RPC processing time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<rpcProcessingTime>>><num><<<s99thPercentileLatency>>> | -| | Shows the 99th percentile of RPC processing time in milliseconds -| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to -| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ - -* RetryCache/NameNodeRetryCache - - RetryCache metrics is useful to monitor NameNode fail-over. - Each metrics record contains Hostname tag. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<CacheHit>>> | Total number of RetryCache hit -*-------------------------------------+--------------------------------------+ -|<<<CacheCleared>>> | Total number of RetryCache cleared -*-------------------------------------+--------------------------------------+ -|<<<CacheUpdated>>> | Total number of RetryCache updated -*-------------------------------------+--------------------------------------+ - -rpcdetailed context - - Metrics of rpcdetailed context are exposed in unified manner by RPC - layer. Two metrics are exposed for each RPC based on its name. - Metrics named "(RPC method name)NumOps" indicates total number of - method calls, and metrics named "(RPC method name)AvgTime" shows - average turn around time for method calls in milliseconds. - -* rpcdetailed - - Each metrics record contains tags such as Hostname - and port (number to which server is bound) - as additional information along with metrics. - - The Metrics about RPCs which is not called are not included - in metrics record. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<methodname><<<NumOps>>> | Total number of the times the method is called -*-------------------------------------+--------------------------------------+ -|<methodname><<<AvgTime>>> | Average turn around time of the method in - | milliseconds -*-------------------------------------+--------------------------------------+ - -dfs context - -* namenode - - Each metrics record contains tags such as ProcessName, SessionId, - and Hostname as additional information along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<CreateFileOps>>> | Total number of files created -*-------------------------------------+--------------------------------------+ -|<<<FilesCreated>>> | Total number of files and directories created by create - | or mkdir operations -*-------------------------------------+--------------------------------------+ -|<<<FilesAppended>>> | Total number of files appended -*-------------------------------------+--------------------------------------+ -|<<<GetBlockLocations>>> | Total number of getBlockLocations operations -*-------------------------------------+--------------------------------------+ -|<<<FilesRenamed>>> | Total number of rename <<operations>> (NOT number of - | files/dirs renamed) -*-------------------------------------+--------------------------------------+ -|<<<GetListingOps>>> | Total number of directory listing operations -*-------------------------------------+--------------------------------------+ -|<<<DeleteFileOps>>> | Total number of delete operations -*-------------------------------------+--------------------------------------+ -|<<<FilesDeleted>>> | Total number of files and directories deleted by delete - | or rename operations -*-------------------------------------+--------------------------------------+ -|<<<FileInfoOps>>> | Total number of getFileInfo and getLinkFileInfo - | operations -*-------------------------------------+--------------------------------------+ -|<<<AddBlockOps>>> | Total number of addBlock operations succeeded -*-------------------------------------+--------------------------------------+ -|<<<GetAdditionalDatanodeOps>>> | Total number of getAdditionalDatanode - | operations -*-------------------------------------+--------------------------------------+ -|<<<CreateSymlinkOps>>> | Total number of createSymlink operations -*-------------------------------------+--------------------------------------+ -|<<<GetLinkTargetOps>>> | Total number of getLinkTarget operations -*-------------------------------------+--------------------------------------+ -|<<<FilesInGetListingOps>>> | Total number of files and directories listed by - | directory listing operations -*-------------------------------------+--------------------------------------+ -|<<<AllowSnapshotOps>>> | Total number of allowSnapshot operations -*-------------------------------------+--------------------------------------+ -|<<<DisallowSnapshotOps>>> | Total number of disallowSnapshot operations -*-------------------------------------+--------------------------------------+ -|<<<CreateSnapshotOps>>> | Total number of createSnapshot operations -*-------------------------------------+--------------------------------------+ -|<<<DeleteSnapshotOps>>> | Total number of deleteSnapshot operations -*-------------------------------------+--------------------------------------+ -|<<<RenameSnapshotOps>>> | Total number of renameSnapshot operations -*-------------------------------------+--------------------------------------+ -|<<<ListSnapshottableDirOps>>> | Total number of snapshottableDirectoryStatus - | operations -*-------------------------------------+--------------------------------------+ -|<<<SnapshotDiffReportOps>>> | Total number of getSnapshotDiffReport - | operations -*-------------------------------------+--------------------------------------+ -|<<<TransactionsNumOps>>> | Total number of Journal transactions -*-------------------------------------+--------------------------------------+ -|<<<TransactionsAvgTime>>> | Average time of Journal transactions in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<SyncsNumOps>>> | Total number of Journal syncs -*-------------------------------------+--------------------------------------+ -|<<<SyncsAvgTime>>> | Average time of Journal syncs in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<TransactionsBatchedInSync>>> | Total number of Journal transactions batched - | in sync -*-------------------------------------+--------------------------------------+ -|<<<BlockReportNumOps>>> | Total number of processing block reports from - | DataNode -*-------------------------------------+--------------------------------------+ -|<<<BlockReportAvgTime>>> | Average time of processing block reports in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<CacheReportNumOps>>> | Total number of processing cache reports from - | DataNode -*-------------------------------------+--------------------------------------+ -|<<<CacheReportAvgTime>>> | Average time of processing cache reports in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<SafeModeTime>>> | The interval between FSNameSystem starts and the last - | time safemode leaves in milliseconds. \ - | (sometimes not equal to the time in SafeMode, - | see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}}) -*-------------------------------------+--------------------------------------+ -|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<GetEditNumOps>>> | Total number of edits downloads from SecondaryNameNode -*-------------------------------------+--------------------------------------+ -|<<<GetEditAvgTime>>> | Average edits download time in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<GetImageNumOps>>> |Total number of fsimage downloads from SecondaryNameNode -*-------------------------------------+--------------------------------------+ -|<<<GetImageAvgTime>>> | Average fsimage download time in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<PutImageNumOps>>> | Total number of fsimage uploads to SecondaryNameNode -*-------------------------------------+--------------------------------------+ -|<<<PutImageAvgTime>>> | Average fsimage upload time in milliseconds -*-------------------------------------+--------------------------------------+ - -* FSNamesystem - - Each metrics record contains tags such as HAState and Hostname - as additional information along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<MissingBlocks>>> | Current number of missing blocks -*-------------------------------------+--------------------------------------+ -|<<<ExpiredHeartbeats>>> | Total number of expired heartbeats -*-------------------------------------+--------------------------------------+ -|<<<TransactionsSinceLastCheckpoint>>> | Total number of transactions since - | last checkpoint -*-------------------------------------+--------------------------------------+ -|<<<TransactionsSinceLastLogRoll>>> | Total number of transactions since last - | edit log roll -*-------------------------------------+--------------------------------------+ -|<<<LastWrittenTransactionId>>> | Last transaction ID written to the edit log -*-------------------------------------+--------------------------------------+ -|<<<LastCheckpointTime>>> | Time in milliseconds since epoch of last checkpoint -*-------------------------------------+--------------------------------------+ -|<<<CapacityTotal>>> | Current raw capacity of DataNodes in bytes -*-------------------------------------+--------------------------------------+ -|<<<CapacityTotalGB>>> | Current raw capacity of DataNodes in GB -*-------------------------------------+--------------------------------------+ -|<<<CapacityUsed>>> | Current used capacity across all DataNodes in bytes -*-------------------------------------+--------------------------------------+ -|<<<CapacityUsedGB>>> | Current used capacity across all DataNodes in GB -*-------------------------------------+--------------------------------------+ -|<<<CapacityRemaining>>> | Current remaining capacity in bytes -*-------------------------------------+--------------------------------------+ -|<<<CapacityRemainingGB>>> | Current remaining capacity in GB -*-------------------------------------+--------------------------------------+ -|<<<CapacityUsedNonDFS>>> | Current space used by DataNodes for non DFS - | purposes in bytes -*-------------------------------------+--------------------------------------+ -|<<<TotalLoad>>> | Current number of connections -*-------------------------------------+--------------------------------------+ -|<<<SnapshottableDirectories>>> | Current number of snapshottable directories -*-------------------------------------+--------------------------------------+ -|<<<Snapshots>>> | Current number of snapshots -*-------------------------------------+--------------------------------------+ -|<<<BlocksTotal>>> | Current number of allocated blocks in the system -*-------------------------------------+--------------------------------------+ -|<<<FilesTotal>>> | Current number of files and directories -*-------------------------------------+--------------------------------------+ -|<<<PendingReplicationBlocks>>> | Current number of blocks pending to be - | replicated -*-------------------------------------+--------------------------------------+ -|<<<UnderReplicatedBlocks>>> | Current number of blocks under replicated -*-------------------------------------+--------------------------------------+ -|<<<CorruptBlocks>>> | Current number of blocks with corrupt replicas. -*-------------------------------------+--------------------------------------+ -|<<<ScheduledReplicationBlocks>>> | Current number of blocks scheduled for - | replications -*-------------------------------------+--------------------------------------+ -|<<<PendingDeletionBlocks>>> | Current number of blocks pending deletion -*-------------------------------------+--------------------------------------+ -|<<<ExcessBlocks>>> | Current number of excess blocks -*-------------------------------------+--------------------------------------+ -|<<<PostponedMisreplicatedBlocks>>> | (HA-only) Current number of blocks - | postponed to replicate -*-------------------------------------+--------------------------------------+ -|<<<PendingDataNodeMessageCourt>>> | (HA-only) Current number of pending - | block-related messages for later - | processing in the standby NameNode -*-------------------------------------+--------------------------------------+ -|<<<MillisSinceLastLoadedEdits>>> | (HA-only) Time in milliseconds since the - | last time standby NameNode load edit log. - | In active NameNode, set to 0 -*-------------------------------------+--------------------------------------+ -|<<<BlockCapacity>>> | Current number of block capacity -*-------------------------------------+--------------------------------------+ -|<<<StaleDataNodes>>> | Current number of DataNodes marked stale due to delayed - | heartbeat -*-------------------------------------+--------------------------------------+ -|<<<TotalFiles>>> |Current number of files and directories (same as FilesTotal) -*-------------------------------------+--------------------------------------+ - -* JournalNode - - The server-side metrics for a journal from the JournalNode's perspective. - Each metrics record contains Hostname tag as additional information - along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<Syncs60sNumOps>>> | Number of sync operations (1 minute granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs60s50thPercentileLatencyMicros>>> | The 50th percentile of sync -| | latency in microseconds (1 minute granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs60s75thPercentileLatencyMicros>>> | The 75th percentile of sync -| | latency in microseconds (1 minute granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs60s90thPercentileLatencyMicros>>> | The 90th percentile of sync -| | latency in microseconds (1 minute granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs60s95thPercentileLatencyMicros>>> | The 95th percentile of sync -| | latency in microseconds (1 minute granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs60s99thPercentileLatencyMicros>>> | The 99th percentile of sync -| | latency in microseconds (1 minute granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs300sNumOps>>> | Number of sync operations (5 minutes granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs300s50thPercentileLatencyMicros>>> | The 50th percentile of sync -| | latency in microseconds (5 minutes granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs300s75thPercentileLatencyMicros>>> | The 75th percentile of sync -| | latency in microseconds (5 minutes granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs300s90thPercentileLatencyMicros>>> | The 90th percentile of sync -| | latency in microseconds (5 minutes granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs300s95thPercentileLatencyMicros>>> | The 95th percentile of sync -| | latency in microseconds (5 minutes granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs300s99thPercentileLatencyMicros>>> | The 99th percentile of sync -| | latency in microseconds (5 minutes granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs3600sNumOps>>> | Number of sync operations (1 hour granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs3600s50thPercentileLatencyMicros>>> | The 50th percentile of sync -| | latency in microseconds (1 hour granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs3600s75thPercentileLatencyMicros>>> | The 75th percentile of sync -| | latency in microseconds (1 hour granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs3600s90thPercentileLatencyMicros>>> | The 90th percentile of sync -| | latency in microseconds (1 hour granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs3600s95thPercentileLatencyMicros>>> | The 95th percentile of sync -| | latency in microseconds (1 hour granularity) -*-------------------------------------+--------------------------------------+ -|<<<Syncs3600s99thPercentileLatencyMicros>>> | The 99th percentile of sync -| | latency in microseconds (1 hour granularity) -*-------------------------------------+--------------------------------------+ -|<<<BatchesWritten>>> | Total number of batches written since startup -*-------------------------------------+--------------------------------------+ -|<<<TxnsWritten>>> | Total number of transactions written since startup -*-------------------------------------+--------------------------------------+ -|<<<BytesWritten>>> | Total number of bytes written since startup -*-------------------------------------+--------------------------------------+ -|<<<BatchesWrittenWhileLagging>>> | Total number of batches written where this -| | node was lagging -*-------------------------------------+--------------------------------------+ -|<<<LastWriterEpoch>>> | Current writer's epoch number -*-------------------------------------+--------------------------------------+ -|<<<CurrentLagTxns>>> | The number of transactions that this JournalNode is -| | lagging -*-------------------------------------+--------------------------------------+ -|<<<LastWrittenTxId>>> | The highest transaction id stored on this JournalNode -*-------------------------------------+--------------------------------------+ -|<<<LastPromisedEpoch>>> | The last epoch number which this node has promised -| | not to accept any lower epoch, or 0 if no promises have been made -*-------------------------------------+--------------------------------------+ - -* datanode - - Each metrics record contains tags such as SessionId and Hostname - as additional information along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<BytesWritten>>> | Total number of bytes written to DataNode -*-------------------------------------+--------------------------------------+ -|<<<BytesRead>>> | Total number of bytes read from DataNode -*-------------------------------------+--------------------------------------+ -|<<<BlocksWritten>>> | Total number of blocks written to DataNode -*-------------------------------------+--------------------------------------+ -|<<<BlocksRead>>> | Total number of blocks read from DataNode -*-------------------------------------+--------------------------------------+ -|<<<BlocksReplicated>>> | Total number of blocks replicated -*-------------------------------------+--------------------------------------+ -|<<<BlocksRemoved>>> | Total number of blocks removed -*-------------------------------------+--------------------------------------+ -|<<<BlocksVerified>>> | Total number of blocks verified -*-------------------------------------+--------------------------------------+ -|<<<BlockVerificationFailures>>> | Total number of verifications failures -*-------------------------------------+--------------------------------------+ -|<<<BlocksCached>>> | Total number of blocks cached -*-------------------------------------+--------------------------------------+ -|<<<BlocksUncached>>> | Total number of blocks uncached -*-------------------------------------+--------------------------------------+ -|<<<ReadsFromLocalClient>>> | Total number of read operations from local client -*-------------------------------------+--------------------------------------+ -|<<<ReadsFromRemoteClient>>> | Total number of read operations from remote - | client -*-------------------------------------+--------------------------------------+ -|<<<WritesFromLocalClient>>> | Total number of write operations from local - | client -*-------------------------------------+--------------------------------------+ -|<<<WritesFromRemoteClient>>> | Total number of write operations from remote - | client -*-------------------------------------+--------------------------------------+ -|<<<BlocksGetLocalPathInfo>>> | Total number of operations to get local path - | names of blocks -*-------------------------------------+--------------------------------------+ -|<<<FsyncCount>>> | Total number of fsync -*-------------------------------------+--------------------------------------+ -|<<<VolumeFailures>>> | Total number of volume failures occurred -*-------------------------------------+--------------------------------------+ -|<<<ReadBlockOpNumOps>>> | Total number of read operations -*-------------------------------------+--------------------------------------+ -|<<<ReadBlockOpAvgTime>>> | Average time of read operations in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<WriteBlockOpNumOps>>> | Total number of write operations -*-------------------------------------+--------------------------------------+ -|<<<WriteBlockOpAvgTime>>> | Average time of write operations in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<BlockChecksumOpNumOps>>> | Total number of blockChecksum operations -*-------------------------------------+--------------------------------------+ -|<<<BlockChecksumOpAvgTime>>> | Average time of blockChecksum operations in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<CopyBlockOpNumOps>>> | Total number of block copy operations -*-------------------------------------+--------------------------------------+ -|<<<CopyBlockOpAvgTime>>> | Average time of block copy operations in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<ReplaceBlockOpNumOps>>> | Total number of block replace operations -*-------------------------------------+--------------------------------------+ -|<<<ReplaceBlockOpAvgTime>>> | Average time of block replace operations in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<HeartbeatsNumOps>>> | Total number of heartbeats -*-------------------------------------+--------------------------------------+ -|<<<HeartbeatsAvgTime>>> | Average heartbeat time in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<BlockReportsNumOps>>> | Total number of block report operations -*-------------------------------------+--------------------------------------+ -|<<<BlockReportsAvgTime>>> | Average time of block report operations in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<CacheReportsNumOps>>> | Total number of cache report operations -*-------------------------------------+--------------------------------------+ -|<<<CacheReportsAvgTime>>> | Average time of cache report operations in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<PacketAckRoundTripTimeNanosNumOps>>> | Total number of ack round trip -*-------------------------------------+--------------------------------------+ -|<<<PacketAckRoundTripTimeNanosAvgTime>>> | Average time from ack send to -| | receive minus the downstream ack time in nanoseconds -*-------------------------------------+--------------------------------------+ -|<<<FlushNanosNumOps>>> | Total number of flushes -*-------------------------------------+--------------------------------------+ -|<<<FlushNanosAvgTime>>> | Average flush time in nanoseconds -*-------------------------------------+--------------------------------------+ -|<<<FsyncNanosNumOps>>> | Total number of fsync -*-------------------------------------+--------------------------------------+ -|<<<FsyncNanosAvgTime>>> | Average fsync time in nanoseconds -*-------------------------------------+--------------------------------------+ -|<<<SendDataPacketBlockedOnNetworkNanosNumOps>>> | Total number of sending - | packets -*-------------------------------------+--------------------------------------+ -|<<<SendDataPacketBlockedOnNetworkNanosAvgTime>>> | Average waiting time of -| | sending packets in nanoseconds -*-------------------------------------+--------------------------------------+ -|<<<SendDataPacketTransferNanosNumOps>>> | Total number of sending packets -*-------------------------------------+--------------------------------------+ -|<<<SendDataPacketTransferNanosAvgTime>>> | Average transfer time of sending - | packets in nanoseconds -*-------------------------------------+--------------------------------------+ - -yarn context - -* ClusterMetrics - - ClusterMetrics shows the metrics of the YARN cluster from the - ResourceManager's perspective. Each metrics record contains - Hostname tag as additional information along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<NumActiveNMs>>> | Current number of active NodeManagers -*-------------------------------------+--------------------------------------+ -|<<<NumDecommissionedNMs>>> | Current number of decommissioned NodeManagers -*-------------------------------------+--------------------------------------+ -|<<<NumLostNMs>>> | Current number of lost NodeManagers for not sending - | heartbeats -*-------------------------------------+--------------------------------------+ -|<<<NumUnhealthyNMs>>> | Current number of unhealthy NodeManagers -*-------------------------------------+--------------------------------------+ -|<<<NumRebootedNMs>>> | Current number of rebooted NodeManagers -*-------------------------------------+--------------------------------------+ - -* QueueMetrics - - QueueMetrics shows an application queue from the - ResourceManager's perspective. Each metrics record shows - the statistics of each queue, and contains tags such as - queue name and Hostname as additional information along with metrics. - - In <<<running_>>><num> metrics such as <<<running_0>>>, you can set the - property <<<yarn.resourcemanager.metrics.runtime.buckets>>> in yarn-site.xml - to change the buckets. The default values is <<<60,300,1440>>>. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<running_0>>> | Current number of running applications whose elapsed time are - | less than 60 minutes -*-------------------------------------+--------------------------------------+ -|<<<running_60>>> | Current number of running applications whose elapsed time are - | between 60 and 300 minutes -*-------------------------------------+--------------------------------------+ -|<<<running_300>>> | Current number of running applications whose elapsed time are - | between 300 and 1440 minutes -*-------------------------------------+--------------------------------------+ -|<<<running_1440>>> | Current number of running applications elapsed time are - | more than 1440 minutes -*-------------------------------------+--------------------------------------+ -|<<<AppsSubmitted>>> | Total number of submitted applications -*-------------------------------------+--------------------------------------+ -|<<<AppsRunning>>> | Current number of running applications -*-------------------------------------+--------------------------------------+ -|<<<AppsPending>>> | Current number of applications that have not yet been - | assigned by any containers -*-------------------------------------+--------------------------------------+ -|<<<AppsCompleted>>> | Total number of completed applications -*-------------------------------------+--------------------------------------+ -|<<<AppsKilled>>> | Total number of killed applications -*-------------------------------------+--------------------------------------+ -|<<<AppsFailed>>> | Total number of failed applications -*-------------------------------------+--------------------------------------+ -|<<<AllocatedMB>>> | Current allocated memory in MB -*-------------------------------------+--------------------------------------+ -|<<<AllocatedVCores>>> | Current allocated CPU in virtual cores -*-------------------------------------+--------------------------------------+ -|<<<AllocatedContainers>>> | Current number of allocated containers -*-------------------------------------+--------------------------------------+ -|<<<AggregateContainersAllocated>>> | Total number of allocated containers -*-------------------------------------+--------------------------------------+ -|<<<AggregateContainersReleased>>> | Total number of released containers -*-------------------------------------+--------------------------------------+ -|<<<AvailableMB>>> | Current available memory in MB -*-------------------------------------+--------------------------------------+ -|<<<AvailableVCores>>> | Current available CPU in virtual cores -*-------------------------------------+--------------------------------------+ -|<<<PendingMB>>> | Current pending memory resource requests in MB that are - | not yet fulfilled by the scheduler -*-------------------------------------+--------------------------------------+ -|<<<PendingVCores>>> | Current pending CPU allocation requests in virtual - | cores that are not yet fulfilled by the scheduler -*-------------------------------------+--------------------------------------+ -|<<<PendingContainers>>> | Current pending resource requests that are not - | yet fulfilled by the scheduler -*-------------------------------------+--------------------------------------+ -|<<<ReservedMB>>> | Current reserved memory in MB -*-------------------------------------+--------------------------------------+ -|<<<ReservedVCores>>> | Current reserved CPU in virtual cores -*-------------------------------------+--------------------------------------+ -|<<<ReservedContainers>>> | Current number of reserved containers -*-------------------------------------+--------------------------------------+ -|<<<ActiveUsers>>> | Current number of active users -*-------------------------------------+--------------------------------------+ -|<<<ActiveApplications>>> | Current number of active applications -*-------------------------------------+--------------------------------------+ -|<<<FairShareMB>>> | (FairScheduler only) Current fair share of memory in MB -*-------------------------------------+--------------------------------------+ -|<<<FairShareVCores>>> | (FairScheduler only) Current fair share of CPU in - | virtual cores -*-------------------------------------+--------------------------------------+ -|<<<MinShareMB>>> | (FairScheduler only) Minimum share of memory in MB -*-------------------------------------+--------------------------------------+ -|<<<MinShareVCores>>> | (FairScheduler only) Minimum share of CPU in virtual - | cores -*-------------------------------------+--------------------------------------+ -|<<<MaxShareMB>>> | (FairScheduler only) Maximum share of memory in MB -*-------------------------------------+--------------------------------------+ -|<<<MaxShareVCores>>> | (FairScheduler only) Maximum share of CPU in virtual - | cores -*-------------------------------------+--------------------------------------+ - -* NodeManagerMetrics - - NodeManagerMetrics shows the statistics of the containers in the node. - Each metrics record contains Hostname tag as additional information - along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<containersLaunched>>> | Total number of launched containers -*-------------------------------------+--------------------------------------+ -|<<<containersCompleted>>> | Total number of successfully completed containers -*-------------------------------------+--------------------------------------+ -|<<<containersFailed>>> | Total number of failed containers -*-------------------------------------+--------------------------------------+ -|<<<containersKilled>>> | Total number of killed containers -*-------------------------------------+--------------------------------------+ -|<<<containersIniting>>> | Current number of initializing containers -*-------------------------------------+--------------------------------------+ -|<<<containersRunning>>> | Current number of running containers -*-------------------------------------+--------------------------------------+ -|<<<allocatedContainers>>> | Current number of allocated containers -*-------------------------------------+--------------------------------------+ -|<<<allocatedGB>>> | Current allocated memory in GB -*-------------------------------------+--------------------------------------+ -|<<<availableGB>>> | Current available memory in GB -*-------------------------------------+--------------------------------------+ - -ugi context - -* UgiMetrics - - UgiMetrics is related to user and group information. - Each metrics record contains Hostname tag as additional information - along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<LoginSuccessNumOps>>> | Total number of successful kerberos logins -*-------------------------------------+--------------------------------------+ -|<<<LoginSuccessAvgTime>>> | Average time for successful kerberos logins in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<LoginFailureNumOps>>> | Total number of failed kerberos logins -*-------------------------------------+--------------------------------------+ -|<<<LoginFailureAvgTime>>> | Average time for failed kerberos logins in - | milliseconds -*-------------------------------------+--------------------------------------+ -|<<<getGroupsNumOps>>> | Total number of group resolutions -*-------------------------------------+--------------------------------------+ -|<<<getGroupsAvgTime>>> | Average time for group resolution in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<getGroups>>><num><<<sNumOps>>> | -| | Total number of group resolutions (<num> seconds granularity). <num> is -| | specified by <<<hadoop.user.group.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<getGroups>>><num><<<s50thPercentileLatency>>> | -| | Shows the 50th percentile of group resolution time in milliseconds -| | (<num> seconds granularity). <num> is specified by -| | <<<hadoop.user.group.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<getGroups>>><num><<<s75thPercentileLatency>>> | -| | Shows the 75th percentile of group resolution time in milliseconds -| | (<num> seconds granularity). <num> is specified by -| | <<<hadoop.user.group.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<getGroups>>><num><<<s90thPercentileLatency>>> | -| | Shows the 90th percentile of group resolution time in milliseconds -| | (<num> seconds granularity). <num> is specified by -| | <<<hadoop.user.group.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<getGroups>>><num><<<s95thPercentileLatency>>> | -| | Shows the 95th percentile of group resolution time in milliseconds -| | (<num> seconds granularity). <num> is specified by -| | <<<hadoop.user.group.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ -|<<<getGroups>>><num><<<s99thPercentileLatency>>> | -| | Shows the 99th percentile of group resolution time in milliseconds -| | (<num> seconds granularity). <num> is specified by -| | <<<hadoop.user.group.metrics.percentiles.intervals>>>. -*-------------------------------------+--------------------------------------+ - -metricssystem context - -* MetricsSystem - - MetricsSystem shows the statistics for metrics snapshots and publishes. - Each metrics record contains Hostname tag as additional information - along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<NumActiveSources>>> | Current number of active metrics sources -*-------------------------------------+--------------------------------------+ -|<<<NumAllSources>>> | Total number of metrics sources -*-------------------------------------+--------------------------------------+ -|<<<NumActiveSinks>>> | Current number of active sinks -*-------------------------------------+--------------------------------------+ -|<<<NumAllSinks>>> | Total number of sinks \ - | (BUT usually less than <<<NumActiveSinks>>>, - | see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}}) -*-------------------------------------+--------------------------------------+ -|<<<SnapshotNumOps>>> | Total number of operations to snapshot statistics from - | a metrics source -*-------------------------------------+--------------------------------------+ -|<<<SnapshotAvgTime>>> | Average time in milliseconds to snapshot statistics - | from a metrics source -*-------------------------------------+--------------------------------------+ -|<<<PublishNumOps>>> | Total number of operations to publish statistics to a - | sink -*-------------------------------------+--------------------------------------+ -|<<<PublishAvgTime>>> | Average time in milliseconds to publish statistics to - | a sink -*-------------------------------------+--------------------------------------+ -|<<<DroppedPubAll>>> | Total number of dropped publishes -*-------------------------------------+--------------------------------------+ -|<<<Sink_>>><instance><<<NumOps>>> | Total number of sink operations for the - | <instance> -*-------------------------------------+--------------------------------------+ -|<<<Sink_>>><instance><<<AvgTime>>> | Average time in milliseconds of sink - | operations for the <instance> -*-------------------------------------+--------------------------------------+ -|<<<Sink_>>><instance><<<Dropped>>> | Total number of dropped sink operations - | for the <instance> -*-------------------------------------+--------------------------------------+ -|<<<Sink_>>><instance><<<Qsize>>> | Current queue length of sink operations \ - | (BUT always set to 0 because nothing to - | increment this metrics, see - | {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}}) -*-------------------------------------+--------------------------------------+ - -default context - -* StartupProgress - - StartupProgress metrics shows the statistics of NameNode startup. - Four metrics are exposed for each startup phase based on its name. - The startup <phase>s are <<<LoadingFsImage>>>, <<<LoadingEdits>>>, - <<<SavingCheckpoint>>>, and <<<SafeMode>>>. - Each metrics record contains Hostname tag as additional information - along with metrics. - -*-------------------------------------+--------------------------------------+ -|| Name || Description -*-------------------------------------+--------------------------------------+ -|<<<ElapsedTime>>> | Total elapsed time in milliseconds -*-------------------------------------+--------------------------------------+ -|<<<PercentComplete>>> | Current rate completed in NameNode startup progress \ - | (The max value is not 100 but 1.0) -*-------------------------------------+--------------------------------------+ -|<phase><<<Count>>> | Total number of steps completed in the phase -*-------------------------------------+--------------------------------------+ -|<phase><<<ElapsedTime>>> | Total elapsed time in the phase in milliseconds -*-------------------------------------+--------------------------------------+ -|<phase><<<Total>>> | Total number of steps in the phase -*-------------------------------------+--------------------------------------+ -|<phase><<<PercentComplete>>> | Current rate completed in the phase \ - | (The max value is not 100 but 1.0) -*-------------------------------------+--------------------------------------+
http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm deleted file mode 100644 index 866b428..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm +++ /dev/null @@ -1,205 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Native Libraries Guide - --- - --- - ${maven.build.timestamp} - -Native Libraries Guide - -%{toc|section=1|fromDepth=0} - -* Overview - - This guide describes the native hadoop library and includes a small - discussion about native shared libraries. - - Note: Depending on your environment, the term "native libraries" could - refer to all *.so's you need to compile; and, the term "native - compression" could refer to all *.so's you need to compile that are - specifically related to compression. Currently, however, this document - only addresses the native hadoop library (<<<libhadoop.so>>>). - The document for libhdfs library (<<<libhdfs.so>>>) is - {{{../hadoop-hdfs/LibHdfs.html}here}}. - -* Native Hadoop Library - - Hadoop has native implementations of certain components for performance - reasons and for non-availability of Java implementations. These - components are available in a single, dynamically-linked native library - called the native hadoop library. On the *nix platforms the library is - named <<<libhadoop.so>>>. - -* Usage - - It is fairly easy to use the native hadoop library: - - [[1]] Review the components. - - [[2]] Review the supported platforms. - - [[3]] Either download a hadoop release, which will include a pre-built - version of the native hadoop library, or build your own version of - the native hadoop library. Whether you download or build, the name - for the library is the same: libhadoop.so - - [[4]] Install the compression codec development packages (>zlib-1.2, - >gzip-1.2): - - * If you download the library, install one or more development - packages - whichever compression codecs you want to use with - your deployment. - - * If you build the library, it is mandatory to install both - development packages. - - [[5]] Check the runtime log files. - -* Components - - The native hadoop library includes various components: - - * Compression Codecs (bzip2, lz4, snappy, zlib) - - * Native IO utilities for {{{../hadoop-hdfs/ShortCircuitLocalReads.html} - HDFS Short-Circuit Local Reads}} and - {{{../hadoop-hdfs/CentralizedCacheManagement.html}Centralized Cache - Management in HDFS}} - - * CRC32 checksum implementation - -* Supported Platforms - - The native hadoop library is supported on *nix platforms only. The - library does not to work with Cygwin or the Mac OS X platform. - - The native hadoop library is mainly used on the GNU/Linus platform and - has been tested on these distributions: - - * RHEL4/Fedora - - * Ubuntu - - * Gentoo - - On all the above distributions a 32/64 bit native hadoop library will - work with a respective 32/64 bit jvm. - -* Download - - The pre-built 32-bit i386-Linux native hadoop library is available as - part of the hadoop distribution and is located in the <<<lib/native>>> - directory. You can download the hadoop distribution from Hadoop Common - Releases. - - Be sure to install the zlib and/or gzip development packages - - whichever compression codecs you want to use with your deployment. - -* Build - - The native hadoop library is written in ANSI C and is built using the - GNU autotools-chain (autoconf, autoheader, automake, autoscan, - libtool). This means it should be straight-forward to build the library - on any platform with a standards-compliant C compiler and the GNU - autotools-chain (see the supported platforms). - - The packages you need to install on the target platform are: - - * C compiler (e.g. GNU C Compiler) - - * GNU Autools Chain: autoconf, automake, libtool - - * zlib-development package (stable version >= 1.2.0) - - * openssl-development package(e.g. libssl-dev) - - Once you installed the prerequisite packages use the standard hadoop - pom.xml file and pass along the native flag to build the native hadoop - library: - ----- - $ mvn package -Pdist,native -DskipTests -Dtar ----- - - You should see the newly-built library in: - ----- - $ hadoop-dist/target/hadoop-${project.version}/lib/native ----- - - Please note the following: - - * It is mandatory to install both the zlib and gzip development - packages on the target platform in order to build the native hadoop - library; however, for deployment it is sufficient to install just - one package if you wish to use only one codec. - - * It is necessary to have the correct 32/64 libraries for zlib, - depending on the 32/64 bit jvm for the target platform, in order to - build and deploy the native hadoop library. - -* Runtime - - The bin/hadoop script ensures that the native hadoop library is on the - library path via the system property: - <<<-Djava.library.path=<path> >>> - - During runtime, check the hadoop log files for your MapReduce tasks. - - * If everything is all right, then: - <<<DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...>>> - <<<INFO util.NativeCodeLoader - Loaded the native-hadoop library>>> - - * If something goes wrong, then: - <<<INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable>>> - -* Check - - NativeLibraryChecker is a tool to check whether native libraries are loaded correctly. - You can launch NativeLibraryChecker as follows: - ----- - $ hadoop checknative -a - 14/12/06 01:30:45 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version - 14/12/06 01:30:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library - Native library checking: - hadoop: true /home/ozawa/hadoop/lib/native/libhadoop.so.1.0.0 - zlib: true /lib/x86_64-linux-gnu/libz.so.1 - snappy: true /usr/lib/libsnappy.so.1 - lz4: true revision:99 - bzip2: false ----- - - -* Native Shared Libraries - - You can load any native shared library using DistributedCache for - distributing and symlinking the library files. - - This example shows you how to distribute a shared library, mylib.so, - and load it from a MapReduce task. - - [[1]] First copy the library to the HDFS: - <<<bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1>>> - - [[2]] The job launching program should contain the following: - <<<DistributedCache.createSymlink(conf);>>> - <<<DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);>>> - - [[3]] The MapReduce task can contain: - <<<System.loadLibrary("mylib.so");>>> - - Note: If you downloaded or built the native hadoop library, you donât - need to use DistibutedCache to make the library available to your - MapReduce tasks. http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/RackAwareness.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/RackAwareness.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/RackAwareness.apt.vm deleted file mode 100644 index dbd8d92..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/RackAwareness.apt.vm +++ /dev/null @@ -1,140 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop ${project.version} - Rack Awareness - --- - --- - ${maven.build.timestamp} - -%{toc|section=1|fromDepth=0} - -Rack Awareness - - Hadoop components are rack-aware. For example, HDFS block placement will - use rack awareness for fault tolerance by placing one block replica on a - different rack. This provides data availability in the event of a network - switch failure or partition within the cluster. - - Hadoop master daemons obtain the rack id of the cluster slaves by invoking - either an external script or java class as specified by configuration files. - Using either the java class or external script for topology, output must - adhere to the java <<org.apache.hadoop.net.DNSToSwitchMapping>> - interface. The interface expects a one-to-one correspondence to be - maintained and the topology information in the format of '/myrack/myhost', - where '/' is the topology delimiter, 'myrack' is the rack identifier, and - 'myhost' is the individual host. Assuming a single /24 subnet per rack, - one could use the format of '/192.168.100.0/192.168.100.5' as a - unique rack-host topology mapping. - - To use the java class for topology mapping, the class name is specified by - the <<topology.node.switch.mapping.impl>> parameter in the configuration - file. An example, NetworkTopology.java, is included with the hadoop - distribution and can be customized by the Hadoop administrator. Using a - Java class instead of an external script has a performance benefit in - that Hadoop doesn't need to fork an external process when a new slave node - registers itself. - - If implementing an external script, it will be specified with the - <<topology.script.file.name>> parameter in the configuration files. Unlike - the java class, the external topology script is not included with the Hadoop - distribution and is provided by the administrator. Hadoop will send - multiple IP addresses to ARGV when forking the topology script. The - number of IP addresses sent to the topology script is controlled with - <<net.topology.script.number.args>> and defaults to 100. If - <<net.topology.script.number.args>> was changed to 1, a topology script - would get forked for each IP submitted by DataNodes and/or NodeManagers. - - If <<topology.script.file.name>> or <<topology.node.switch.mapping.impl>> is - not set, the rack id '/default-rack' is returned for any passed IP address. - While this behavior appears desirable, it can cause issues with HDFS block - replication as default behavior is to write one replicated block off rack - and is unable to do so as there is only a single rack named '/default-rack'. - - An additional configuration setting is - <<mapreduce.jobtracker.taskcache.levels>> which determines the number of - levels (in the network topology) of caches MapReduce will use. So, for - example, if it is the default value of 2, two levels of caches will be - constructed - one for hosts (host -> task mapping) and another for racks - (rack -> task mapping). Giving us our one-to-one mapping of '/myrack/myhost'. - -* {python Example} - -+-------------------------------+ - #!/usr/bin/python - # this script makes assumptions about the physical environment. - # 1) each rack is its own layer 3 network with a /24 subnet, which - # could be typical where each rack has its own - # switch with uplinks to a central core router. - # - # +-----------+ - # |core router| - # +-----------+ - # / \ - # +-----------+ +-----------+ - # |rack switch| |rack switch| - # +-----------+ +-----------+ - # | data node | | data node | - # +-----------+ +-----------+ - # | data node | | data node | - # +-----------+ +-----------+ - # - # 2) topology script gets list of IP's as input, calculates network address, and prints '/network_address/ip'. - - import netaddr - import sys - sys.argv.pop(0) # discard name of topology script from argv list as we just want IP addresses - - netmask = '255.255.255.0' # set netmask to what's being used in your environment. The example uses a /24 - - for ip in sys.argv: # loop over list of datanode IP's - address = '{0}/{1}'.format(ip, netmask) # format address string so it looks like 'ip/netmask' to make netaddr work - try: - network_address = netaddr.IPNetwork(address).network # calculate and print network address - print "/{0}".format(network_address) - except: - print "/rack-unknown" # print catch-all value if unable to calculate network address -+-------------------------------+ - -* {bash Example} - -+-------------------------------+ - #!/bin/bash - # Here's a bash example to show just how simple these scripts can be - # Assuming we have flat network with everything on a single switch, we can fake a rack topology. - # This could occur in a lab environment where we have limited nodes,like 2-8 physical machines on a unmanaged switch. - # This may also apply to multiple virtual machines running on the same physical hardware. - # The number of machines isn't important, but that we are trying to fake a network topology when there isn't one. - # - # +----------+ +--------+ - # |jobtracker| |datanode| - # +----------+ +--------+ - # \ / - # +--------+ +--------+ +--------+ - # |datanode|--| switch |--|datanode| - # +--------+ +--------+ +--------+ - # / \ - # +--------+ +--------+ - # |datanode| |namenode| - # +--------+ +--------+ - # - # With this network topology, we are treating each host as a rack. This is being done by taking the last octet - # in the datanode's IP and prepending it with the word '/rack-'. The advantage for doing this is so HDFS - # can create its 'off-rack' block copy. - # 1) 'echo $@' will echo all ARGV values to xargs. - # 2) 'xargs' will enforce that we print a single argv value per line - # 3) 'awk' will split fields on dots and append the last field to the string '/rack-'. If awk - # fails to split on four dots, it will still print '/rack-' last field value - - echo $@ | xargs -n 1 | awk -F '.' '{print "/rack-"$NF}' -+-------------------------------+ -
