[ignite] branch master updated: IGNITE-20070 Ignite Docs: Remove docs related to DataRegionMetrics and so on (#10865)

isapego Thu, 31 Aug 2023 03:36:20 -0700

This is an automated email from the ASF dual-hosted git repository.

isapego pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ignite.git



The following commit(s) were added to refs/heads/master by this push:
     new d43deea374b IGNITE-20070 Ignite Docs: Remove docs related to 
DataRegionMetrics and so on (#10865)
d43deea374b is described below

commit d43deea374b84cb698adc4025b644492328f8650
Author: liyujue <[email protected]>
AuthorDate: Thu Aug 31 18:36:09 2023 +0800

    IGNITE-20070 Ignite Docs: Remove docs related to DataRegionMetrics and so 
on (#10865)
---
 docs/_data/toc.yaml                                |  10 +-
 .../monitoring-metrics/configuring-metrics.adoc    |  27 --
 docs/_docs/monitoring-metrics/metrics.adoc         | 503 ---------------------
 .../monitoring-metrics/new-metrics-system.adoc     | 281 +++++++++++-
 4 files changed, 279 insertions(+), 542 deletions(-)

diff --git a/docs/_data/toc.yaml b/docs/_data/toc.yaml
index d5b0faf4974..065de1549e0 100644
--- a/docs/_data/toc.yaml
+++ b/docs/_data/toc.yaml
@@ -430,16 +430,12 @@
       url: monitoring-metrics/cluster-id
     - title: Cluster States
       url: monitoring-metrics/cluster-states
-    - title: Metrics
-      items: 
-        - title: Configuring Metrics
-          url: monitoring-metrics/configuring-metrics
-        - title: JMX Metrics
-          url: monitoring-metrics/metrics
-    - title: New Metrics System 
+    - title: Metrics System 
       items:
         - title: Introduction 
           url: monitoring-metrics/new-metrics-system
+        - title: Configuring Metrics
+          url: monitoring-metrics/configuring-metrics
         - title: Metrics
           url: monitoring-metrics/new-metrics
     - title: System Views
diff --git a/docs/_docs/monitoring-metrics/configuring-metrics.adoc 
b/docs/_docs/monitoring-metrics/configuring-metrics.adoc
index 929e4ec8d3a..8ccad23104d 100644
--- a/docs/_docs/monitoring-metrics/configuring-metrics.adoc
+++ b/docs/_docs/monitoring-metrics/configuring-metrics.adoc
@@ -80,14 +80,6 @@ include::{dotnetFile}[tags=data-region-metrics,indent=0]
 tab:C++[unsupported]
 --
 
-Data region metrics can be enabled/disabled at runtime via the following JMX 
Bean:
-
-*Data Region MBean*::
-+
-----
-org.apache:group=DataRegionMetrics,name=<Data Region Name>
-----
-
 == Enabling Persistence-related Metrics
 Persistence-related metrics can be enabled/disabled in the data storage 
configuration:
 
@@ -110,22 +102,3 @@ include::{dotnetFile}[tags=data-storage-metrics,indent=0]
 ----
 tab:C++[unsupported]
 --
-
-
-You can enable "Persistent Store" metrics at runtime via the following MXBean:
-
-*Persistent Store MBean*::
-+
---
-----
-org.apache:group="Persistent Store",name=DataStorageMetrics
-----
-[cols="1,4",opts="header"]
-|===
-| Operation | Description
-| EnableMetrics | Enable persistent data storage metrics.
-|===
---
-
-
-
diff --git a/docs/_docs/monitoring-metrics/metrics.adoc 
b/docs/_docs/monitoring-metrics/metrics.adoc
deleted file mode 100644
index 5511e3f593c..00000000000
--- a/docs/_docs/monitoring-metrics/metrics.adoc
+++ /dev/null
@@ -1,503 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-= JMX Metrics
-
-:table_opts: cols="3,2,8,2", opts="stretch,header"
-
-== Overview
-
-The Apache Ignite 2.8 release introduced a new mechanism for collecting 
metrics, which is intended to replace all
-the legacy metrics below. Please, check the 
link:monitoring-metrics/new-metrics-system[New Metrics System].
-
-Ignite exposes a large number of metrics useful for monitoring your cluster or 
application.
-You can use JMX and a monitoring tool, such as JConsole to access these 
metrics via JMX.
-You can also access them programmatically.
-
-On this page, we've collected the most useful metrics and grouped them into 
various common categories based on the monitoring task.
-
-== Enabling JMX for Ignite
-
-By default, the JMX automatic configuration is disabled.
-To enable it, configure the following environment variables:
-
-* For `control.sh`, configure the `CONTROL_JVM_OPTS` variable
-* For `ignite.sh`, configure the `JVM_OPTS` variable
-
-For example:
-
-[source,shell]
-----
-JVM_OPTS="-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=${JMX_PORT} \
--Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false"
-----
-
-
-// link:monitoring-metrics/configuring-metrics[Configuring Metrics]
-
-== Understanding MBean's ObjectName
-
-Every JMX Mbean has an 
https://docs.oracle.com/javase/8/docs/api/javax/management/ObjectName.html[ObjectName,window=_blank].
-The ObjectName is used to identify the bean.
-The ObjectName consists of a domain and a list of key properties, and can be 
represented as a string as follows:
-
-   domain: key1 = value1 , key2 = value2
-
-All Ignite metrics have the same domain: `org.apache.<classloaderId>` where 
the classloader ID is optional (omitted if you set 
`IGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false`). In addition, each metric has two 
properties: `group` and `name`.
-For example:
-
-    org.apache:group=SPIs,name=TcpDiscoverySpi
-
-This MBean provides various metrics related to node discovery.
-
-The MBean ObjectName can be used to identify the bean in UI tools like 
JConsole.
-For example, JConsole displays MBeans in a tree-like structure where all beans 
are first grouped by domain and then by the 'group' property:
-
-image::images/jconsole.png[]
-
-{sp}+
-
-== Monitoring the Amount of Data
-
-If you do not use link:persistence/native-persistence[Native persistence] 
(i.e., all your data is kept in memory), you would want to monitor RAM usage.
-If you use Native persistence, in addition to RAM, you should monitor the size 
of the data storage on disk.
-
-The size of the data loaded into a node is available at different levels of 
aggregation. You can monitor for:
-
-* The total size of the data the node keeps on disk or in RAM. This amount is 
the sum of the size of each configured data region (in the simplest case, only 
the default data region) plus the sizes of the system data regions.
-* The size of a specific link:memory-configuration/data-regions[data region] 
on that node. The data region size is the sum of the sizes of all cache groups.
-* The size of a specific cache/cache group on that node, including the backup 
partitions.
-
-These metrics can be enabled/disabled for each level separately and are 
exposed via different JMX beans listed below.
-
-
-=== Allocated Space vs. Actual Size of Data
-
-There is no way to get the exact size of the data (neither in RAM nor on 
disk). Instead, there are two ways to estimate it.
-
-You can get the size of the space _allocated_ for storing the data.
-(The "space" here refers either to the space in RAM or on disk depending on 
whether you use Native persistence or not.)
-Space is allocated when the size of the storage gets full and more entries 
need to be added.
-However, when you remove entries from caches, the space is not deallocated.
-It is reused when new entries need to be added to the storage on subsequent 
write operations. Therefore, the allocated size does not decrease when you 
remove entries from the caches.
-The allocated size is available at the level of data storage, data region, and 
cache group metrics.
-The metric is called `TotalAllocatedSize`.
-
-You can also get an estimate of the actual size of data by multiplying the 
number of link:memory-centric-storage#data-pages[data pages] in use by the fill 
factor. The fill factor is the ratio of the size of data in a page to the page 
size, averaged over all pages. The number of pages in use and the fill factor 
are available at the level of data <<Data Region Size,region metrics>>.
-
-Add up the estimated size of all data regions to get the estimated total 
amount of data on the node.
-
-
-:allocsize_note: Note that when Native persistence is disabled, this metric 
shows the total size of the allocated space in RAM.
-
-=== Monitoring RAM Memory Usage
-The amount of data in RAM can be monitored for each data region through the 
following MBeans:
-
-Mbean's Object Name: ::
-+
---
-----
-group=DataRegionMetrics,name=<Data Region name>
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-
-| PagesFillFactor| float | The average size of data in pages as a ratio of the 
page size. When Native persistence is enabled, this metric is applicable only 
to the persistent storage (i.e. pages on disk). | Node
-| TotalUsedPages | long | The number of data pages that are currently in use. 
When Native persistence is enabled, this metric is applicable only to the 
persistent storage (i.e. pages on disk).| Node
-| PhysicalMemoryPages |long | The number of the allocated pages in RAM. | Node
-| PhysicalMemorySize |long |The size of the allocated space in RAM in bytes. | 
Node
-|===
---
-
-If you have multiple data regions, add up the sizes of all data regions to get 
the total size of the data on the node.
-
-=== Monitoring Storage Size
-
-Persistent storage, when enabled, saves all application data on disk.
-The total amount of data each node keeps on disk consists of the persistent 
storage (application data), the 
link:persistence/native-persistence#write-ahead-log[WAL files], and 
link:persistence/native-persistence#wal-archive[WAL Archive] files.
-
-==== Persistent Storage Size
-To monitor the size of the persistent storage on disk, use the following 
metrics:
-
-Mbean's Object Name: ::
-+
---
-----
-group="Persistent Store",name=DataStorageMetrics
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| TotalAllocatedSize | long  | The size of the space allocated on disk for the 
entire data storage (in bytes). {allocsize_note} | Node
-| WalTotalSize | long | Total size of the WAL files in bytes, including the 
WAL archive files. | Node
-| WalArchiveSegments | int | The number of WAL segments in the archive.  | Node
-|===
-
-[cols="1,4",opts="header"]
-|===
-|Operation | Description
-| enableMetrics | Enable collection of metrics related to the persistent 
storage at runtime.
-| disableMetrics | Disable metrics collection.
-|===
---
-
-==== Data Region Size
-
-For each configured data region, Ignite creates a separate JMX Bean that 
exposes specific information about the region. Metrics collection for data 
regions are disabled by default. You can 
link:monitoring-metrics/configuring-metrics#enabling-data-region-metrics[enable 
it in the data region configuration, or via JMX at runtime] (see the Bean's 
operations below).
-
-The size of the data region on a node comprises the size of all partitions 
(including backup partitions) that this node owns for all caches in that data 
region.
-
-Data region metrics are available in the following MBean:
-
-Mbean's Object Name: ::
-+
---
-----
-group=DataRegionMetrics,name=<Data Region name>
-----
-
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-
-| TotalAllocatedSize | long  | The size of the space allocated for this data 
region (in bytes). {allocsize_note} | Node
-| PagesFillFactor| float | The average amount of data in pages as a ratio of 
the page size. | Node
-| TotalUsedPages | long | The number of data pages that are currently in use. 
| Node
-| PhysicalMemoryPages |long |The number of data pages in this data region held 
in RAM. | Node
-| PhysicalMemorySize | long |The size of the allocated space in RAM in bytes.| 
Node
-|===
-
-[cols="1,4",opts="header"]
-|===
-|Operation | Description
-| enableMetrics | Enable metrics collection for this data region.
-| disableMetrics | Disable metrics collection for this data region.
-|===
---
-
-==== Cache Group Size
-
-If you don't use link:configuring-caches/cache-groups[cache groups], each 
cache will be its own group.
-There is a separate JMX bean for each cache group.
-The name of the bean corresponds to the name of the group.
-
-Mbean's Object Name: ::
-+
---
-----
-group="Cache groups",name=<Cache group name>
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-|TotalAllocatedSize |long | The amount of space allocated for the cache group 
on this node. | Node
-|===
---
-
-== Monitoring Checkpointing Operations
-Checkpointing may slow down cluster operations.
-You may want to monitor how much time each checkpoint operation takes, so that 
you can tune the properties that affect checkpointing.
-You may also want to monitor the disk performance to see if the slow-down is 
caused by external reasons.
-
-See link:persistence/persistence-tuning#pages-writes-throttling[Pages Writes 
Throttling] and 
link:persistence/persistence-tuning#adjusting-checkpointing-buffer-size[Checkpointing
 Buffer Size] for performance tips.
-
-Mbean's Object Name: ::
-+
---
-    group="Persistent Store",name=DataStorageMetrics
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| DirtyPages  | long | The number of pages in memory that have been changed 
but not yet synchronized to disk. Those will be written to disk during next 
checkpoint. | Node
-|LastCheckpointDuration | long | The time in milliseconds it took to create 
the last checkpoint. | Node
-|CheckpointBufferSize | long | The size of the checkpointing buffer. | Global
-|===
---
-
-
-== Monitoring Rebalancing
-link:data-rebalancing[Rebalancing] is the process of moving partitions between 
the cluster nodes so that the data is always distributed in a balanced manner. 
Rebalancing is triggered when a new node joins, or an existing node leaves the 
cluster.
-
-If you have multiple caches, they will be rebalanced sequentially.
-There are several metrics that you can use to monitor the progress of the 
rebalancing process for a specific cache.
-
-In the new metric system, link:monitoring-metrics/new-metrics#caches[Cache 
metrics]:
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-|RebalancingStartTime | long | This metric shows the time when rebalancing of 
local partitions started for the cache. This metric will return 0 if the local 
partitions do not participate in the rebalancing. The time is returned in 
milliseconds. | Node
-| EstimatedRebalancingFinishTime | long | Expected time of completion of the 
rebalancing process. |  Node
-| KeysToRebalanceLeft | long | The number of keys on the node that remain to 
be rebalanced.  You can monitor this metric to learn when the rebalancing 
process finishes.| Node
-|===
---
-
-
-== Monitoring Topology
-Topology refers to the set of nodes in a cluster. There are a number of 
metrics that expose the information about the topology of the cluster. If the 
topology changes too frequently or has a size that is different from what you 
expect, you may want to look into whether there are network problems.
-
-
-Mbean's Object Name: ::
-+
---
-----
-group=Kernal,name=ClusterMetricsMXBeanImpl
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| TotalServerNodes| long  |The number of server nodes in the cluster.| Global
-| TotalClientNodes| long |The number of client nodes in the cluster. | Global
-| TotalBaselineNodes | long | The number of nodes that are registered in the 
link:clustering/baseline-topology[baseline topology]. When a node goes down, it 
remains registered in the baseline topology and you need to remote it manually. 
|  Global
-| ActiveBaselineNodes | long | The number of nodes that are currently active 
in the baseline topology.  |  Global
-|===
---
-
-Mbean's Object Name: ::
-+
---
-----
-group=SPIs,name=TcpDiscoverySpi
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| Coordinator | String | The node ID of the current coordinator node.| Global
-| CoordinatorNodeFormatted|String a|
-Detailed information about the coordinator node.
-....
-TcpDiscoveryNode [id=e07ad289-ff5b-4a73-b3d4-d323a661b6d4,
-consistentId=fa65ff2b-e7e2-4367-96d9-fd0915529c25,
-addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.25.4.200],
-sockAddrs=[mymachine.local/172.25.4.200:47500,
-/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500,
-order=2, intOrder=2, lastExchangeTime=1568187777249, loc=false,
-ver=8.7.5#20190520-sha1:d159cd7a, isClient=false]
-....
-
-| Global
-|===
---
-
-== Monitoring Caches
-
-See the new metric system, link:monitoring-metrics/new-metrics#caches[Cache 
metrics].
-
-=== Monitoring Build and Rebuild Indexes
-
-To get an estimate on how long it takes to rebuild cache indexes, you can use 
one of the metrics listed below:
-
-. `IsIndexRebuildInProgress` - tells whether indexes are being built or 
rebuilt at the moment;
-. `IndexBuildCountPartitionsLeft` - gives the remaining number of partitions 
(by cache group) for indexes to rebuild.
-
-Note that the `IndexBuildCountPartitionsLeft` metric allows to estimate only 
an approximate number of indexes left to rebuild.
-For a more accurate estimate, use the `IndexRebuildKeyProcessed` cache metric:
-
-* Use `isIndexRebuildInProgress` to know whether the indexes are being rebuilt 
for the cache.
-
-* Use `IndexRebuildKeysProcessed` to know the number of keys with rebuilt 
indexes. If the rebuilding is in progress, it gives a number of keys with 
indexes being rebuilt at the current moment. Otherwise, it gives a total number 
of the of keys with rebuilt indexes. The values are reset before the start of 
each rebuilding.
-
-== Monitoring Transactions
-Note that if a transaction spans multiple nodes (i.e., if the keys that are 
changed as a result of the transaction execution are located on multiple 
nodes), the counters will increase on each node. For example, the 
'TransactionsCommittedNumber' counter will increase on each node where the keys 
affected by the transaction are stored.
-
-Mbean's Object Name: ::
-+
---
-----
-group=TransactionMetrics,name=TransactionMetricsMxBeanImpl
-----
-
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| LockedKeysNumber | long  | The number of keys locked on the node. | Node
-| TransactionsCommittedNumber |long | The number of transactions that have 
been committed on the node  | Node
-| TransactionsRolledBackNumber | long | The number of transactions that were 
rolled back. | Node
-| OwnerTransactionsNumber | long |  The number of transactions initiated on 
the node. | Node
-| TransactionsHoldingLockNumber | long | The number of open transactions that 
hold a lock on at least one key on the node.| Node
-|===
---
-
-////
-this isn't in 8.7.6 yet
-{sp}+
-
-Mbean's Object Name: ::
-`group=Transactions,name=TransactionsMXBeanImpl`
-*Attributes:*::
-{sp}
-+
---
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| TotalNodeSystemTime  | long | system time | Node
-| TotalNodeUserTime |  | |  Node
-| NodeSystemTimeHistogram | | | Node
-| NodeUserTimeHistogram | |  | Node
-|===
---
-
-////
-
-
-////
-{sp}+
-
-
-== Monitoring Compute Jobs
-
-Mbean's Object Name: ::
-`group= ,name=`
-*Attributes:*::
-{sp}
-+
---
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-|  |  | |
-|===
---
-
-////
-
-
-////
-== Monitoring Snapshots
-
-Mbean's Object Name: ::
-+
---
-----
-group=TODO ,name= TODO
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| LastSnapshotOperation |  | |
-| LastSnapshotStartTime || |
-| SnapshotInProgress | | |
-|===
---
-////
-
-
-////
-== Monitoring Memory Consumption
-
-JVM memory
-
-Mbean's Object Name: ::
-+
-----
-group=Kernal,name=ClusterMetricsMXBeanImpl
-----
-*Attributes:*::
-+
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| HeapMemoryUsed | long  | The Java heap size on the node. | Node
-|===
-
-////
-
-
-== Monitoring Client Connections
-Metrics related to JDBC/ODBC or thin client connections.
-
-Mbean's Object Name: ::
-+
---
-----
-group=Clients,name=ClientListenerProcessor
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| Connections | java.util.List<String> a| A list of strings, each string 
containing information about a connection:
-
-....
-JdbcClient [id=4294967297, user=<anonymous>,
-rmtAddr=127.0.0.1:39264, locAddr=127.0.0.1:10800]
-....
-| Node
-|===
-
-[cols="1,4",opts="header"]
-|===
-|Operation | Description
-| dropConnection (id)| Disconnect a specific client.
-| dropAllConnections | Disconnect all clients.
-|===
---
-
-
-== Monitoring Message Queues
-When thread pools queues' are growing, it means that the node cannot keep up 
with the load, or there was an error while processing messages in the queue.
-Continuous growth of the queue size can lead to OOM errors.
-
-
-=== Communication Message Queue
-The queue of outgoing communication messages contains communication messages 
that are waiting to be sent to other nodes.
-If the size is growing, it means there is a problem.
-
-Mbean's Object Name: ::
-+
---
-----
-group=SPIs,name=TcpCommunicationSpi
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| OutboundMessagesQueueSize  | int | The size of the queue of outgoing 
communication messages. | Node
-|===
---
-
-=== Discovery Messages Queue
-
-The queue of discovery messages.
-
-Mbean's Object Name: ::
-+
---
-----
-group=SPIs,name=TcpDiscoverySpi
-----
-[{table_opts}]
-|===
-| Attribute | Type | Description | Scope
-| MessageWorkerQueueSize | int | The size of the queue of discovery messages 
that are waiting to be sent to other nodes. | Node
-|AvgMessageProcessingTime|long| Average message processing time. | Node
-|===
---
-
-////
-
-== Monitoring Executor Queue Size
-
-There is a number of executor thread pools running within each node that are 
dedicated to specific tasks.
-You may want to monitor the size of the executor's queues.
-You can read more about the thread pools on the 
link:perf-troubleshooting-guide/thread-pools-tuning[Thread Tuning Page]
-
-There is a JMX Bean for each thread pool.
-
-////
-
-
-
-
-
diff --git a/docs/_docs/monitoring-metrics/new-metrics-system.adoc 
b/docs/_docs/monitoring-metrics/new-metrics-system.adoc
index e300ebbe3e1..62a6cba3158 100644
--- a/docs/_docs/monitoring-metrics/new-metrics-system.adoc
+++ b/docs/_docs/monitoring-metrics/new-metrics-system.adoc
@@ -12,17 +12,16 @@
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
-= New Metrics System
+= Metrics System
 
 :javaFile: {javaCodeDir}/ConfiguringMetrics.java
 
 == Overview
 
-Ignite 2.8 introduced a new mechanism for collecting metrics, which is 
intended to replace the link:monitoring-metrics/metrics[legacy metrics system].
-This section explains the new system and how you can use it to monitor your 
cluster.
-//the types of metrics and how to export them, but first let's explore the 
basic concepts of the new metrics mechanism in Ignite.
+This section explains the metrics system and how you can use it to monitor 
your cluster.
+//the types of metrics and how to export them, but first let's explore the 
basic concepts of the metrics mechanism in Ignite.
 
-Let's explore the basic concepts of the new metrics system in Ignite.
+Let's explore the basic concepts of the metrics system in Ignite.
 First, there are different metrics.
 Each metric has a name and a return value.
 The return value can be a simple value like `String`, `long`, or `double`, or 
can represent a Java object.
@@ -115,6 +114,47 @@ tab:C#/.NET[]
 tab:C++[unsupported]
 --
 
+==== Enabling JMX for Ignite
+
+By default, the JMX automatic configuration is disabled.
+To enable it, configure the following environment variables:
+
+* For `control.sh`, configure the `CONTROL_JVM_OPTS` variable
+* For `ignite.sh`, configure the `JVM_OPTS` variable
+
+For example:
+
+[source,shell]
+----
+JVM_OPTS="-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=${JMX_PORT} \
+-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false"
+----
+
+
+// link:monitoring-metrics/configuring-metrics[Configuring Metrics]
+
+==== Understanding MBean's ObjectName
+
+Every JMX Mbean has an 
https://docs.oracle.com/javase/8/docs/api/javax/management/ObjectName.html[ObjectName,window=_blank].
+The ObjectName is used to identify the bean.
+The ObjectName consists of a domain and a list of key properties, and can be 
represented as a string as follows:
+
+   domain: key1 = value1 , key2 = value2
+
+All Ignite metrics have the same domain: `org.apache.<classloaderId>` where 
the classloader ID is optional (omitted if you set 
`IGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false`). In addition, each metric has two 
properties: `group` and `name`.
+For example:
+
+    org.apache:group=SPIs,name=TcpDiscoverySpi
+
+This MBean provides various metrics related to node discovery.
+
+The MBean ObjectName can be used to identify the bean in UI tools like 
JConsole.
+For example, JConsole displays MBeans in a tree-like structure where all beans 
are first grouped by domain and then by the 'group' property:
+
+image::images/jconsole.png[]
+
+{sp}+
+
 === SQL View
 
 `SqlViewMetricExporterSpi` is enabled by default, `SqlViewMetricExporterSpi` 
exposes metrics via the `SYS.METRICS` view.
@@ -209,5 +249,236 @@ Example of the metric names if the bounds are [10,100]:
 * `histogram_10_100` - between 10 and 100.
 * `histogram_100_inf` - more than 100.
 
+== Common monitoring tasks
+=== Monitoring the Amount of Data
+
+If you do not use link:persistence/native-persistence[Native persistence] 
(i.e., all your data is kept in memory), you would want to monitor RAM usage.
+If you use Native persistence, in addition to RAM, you should monitor the size 
of the data storage on disk.
+
+The size of the data loaded into a node is available at different levels of 
aggregation. You can monitor for:
+
+* The total size of the data the node keeps on disk or in RAM. This amount is 
the sum of the size of each configured data region (in the simplest case, only 
the default data region) plus the sizes of the system data regions.
+* The size of a specific link:memory-configuration/data-regions[data region] 
on that node. The data region size is the sum of the sizes of all cache groups.
+* The size of a specific cache/cache group on that node, including the backup 
partitions.
+
+
+==== Allocated Space vs. Actual Size of Data
+
+There is no way to get the exact size of the data (neither in RAM nor on 
disk). Instead, there are two ways to estimate it.
+
+You can get the size of the space _allocated_ for storing the data.
+(The "space" here refers either to the space in RAM or on disk depending on 
whether you use Native persistence or not.)
+Space is allocated when the size of the storage gets full and more entries 
need to be added.
+However, when you remove entries from caches, the space is not deallocated.
+It is reused when new entries need to be added to the storage on subsequent 
write operations. Therefore, the allocated size does not decrease when you 
remove entries from the caches.
+The allocated size is available at the level of data storage, data region, and 
cache group metrics.
+The metric is called `TotalAllocatedSize`.
+
+You can also get an estimate of the actual size of data by multiplying the 
number of link:memory-centric-storage#data-pages[data pages] in use by the fill 
factor. The fill factor is the ratio of the size of data in a page to the page 
size, averaged over all pages. The number of pages in use and the fill factor 
are available at the level of data <<Data Region Size,region metrics>>.
+
+Add up the estimated size of all data regions to get the estimated total 
amount of data on the node.
+
+
+:allocsize_note: Note that when Native persistence is disabled, this metric 
shows the total size of the allocated space in RAM.
+
+==== Monitoring RAM Memory Usage
+The amount of data in RAM can be monitored for each data region through the 
following metrics:
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+
+| PagesFillFactor| float | The average size of data in pages as a ratio of the 
page size. When Native persistence is enabled, this metric is applicable only 
to the persistent storage (i.e. pages on disk). | Node
+| TotalUsedPages | long | The number of data pages that are currently in use. 
When Native persistence is enabled, this metric is applicable only to the 
persistent storage (i.e. pages on disk).| Node
+| PhysicalMemoryPages |long | The number of the allocated pages in RAM. | Node
+| PhysicalMemorySize |long |The size of the allocated space in RAM in bytes. | 
Node
+|===
+
+If you have multiple data regions, add up the sizes of all data regions to get 
the total size of the data on the node.
+
+==== Monitoring Storage Size
+
+Persistent storage, when enabled, saves all application data on disk.
+The total amount of data each node keeps on disk consists of the persistent 
storage (application data), the 
link:persistence/native-persistence#write-ahead-log[WAL files], and 
link:persistence/native-persistence#wal-archive[WAL Archive] files.
+
+===== Persistent Storage Size
+To monitor the size of the persistent storage on disk, use the following 
metrics:
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| TotalAllocatedSize | long  | The size of the space allocated on disk for the 
entire data storage (in bytes). {allocsize_note} | Node
+| WalTotalSize | long | Total size of the WAL files in bytes, including the 
WAL archive files. | Node
+| WalArchiveSegments | int | The number of WAL segments in the archive.  | Node
+|===
+
+===== Data Region Size
+
+For each configured data region, Metrics collection for data regions are 
disabled by default. You can 
link:monitoring-metrics/configuring-metrics#enabling-data-region-metrics[enable 
it in the data region configuration.
+
+The size of the data region on a node comprises the size of all partitions 
(including backup partitions) that this node owns for all caches in that data 
region.
 
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
 
+| TotalAllocatedSize | long  | The size of the space allocated for this data 
region (in bytes). {allocsize_note} | Node
+| PagesFillFactor| float | The average amount of data in pages as a ratio of 
the page size. | Node
+| TotalUsedPages | long | The number of data pages that are currently in use. 
| Node
+| PhysicalMemoryPages |long |The number of data pages in this data region held 
in RAM. | Node
+| PhysicalMemorySize | long |The size of the allocated space in RAM in bytes.| 
Node
+|===
+
+===== Cache Group Size
+
+If you don't use link:configuring-caches/cache-groups[cache groups], each 
cache will be its own group.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+|TotalAllocatedSize |long | The amount of space allocated for the cache group 
on this node. | Node
+|===
+
+=== Monitoring Checkpointing Operations
+Checkpointing may slow down cluster operations.
+You may want to monitor how much time each checkpoint operation takes, so that 
you can tune the properties that affect checkpointing.
+You may also want to monitor the disk performance to see if the slow-down is 
caused by external reasons.
+
+See link:persistence/persistence-tuning#pages-writes-throttling[Pages Writes 
Throttling] and 
link:persistence/persistence-tuning#adjusting-checkpointing-buffer-size[Checkpointing
 Buffer Size] for performance tips.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| DirtyPages  | long | The number of pages in memory that have been changed 
but not yet synchronized to disk. Those will be written to disk during next 
checkpoint. | Node
+|LastCheckpointDuration | long | The time in milliseconds it took to create 
the last checkpoint. | Node
+|CheckpointBufferSize | long | The size of the checkpointing buffer. | Global
+|===
+
+=== Monitoring Rebalancing
+link:data-rebalancing[Rebalancing] is the process of moving partitions between 
the cluster nodes so that the data is always distributed in a balanced manner. 
Rebalancing is triggered when a new node joins, or an existing node leaves the 
cluster.
+
+If you have multiple caches, they will be rebalanced sequentially.
+There are several metrics that you can use to monitor the progress of the 
rebalancing process for a specific cache.
+
+In the metric system, link:monitoring-metrics/new-metrics#caches[Cache 
metrics]:
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+|RebalancingStartTime | long | This metric shows the time when rebalancing of 
local partitions started for the cache. This metric will return 0 if the local 
partitions do not participate in the rebalancing. The time is returned in 
milliseconds. | Node
+| EstimatedRebalancingFinishTime | long | Expected time of completion of the 
rebalancing process. |  Node
+| KeysToRebalanceLeft | long | The number of keys on the node that remain to 
be rebalanced.  You can monitor this metric to learn when the rebalancing 
process finishes.| Node
+|===
+
+=== Monitoring Topology
+Topology refers to the set of nodes in a cluster. There are a number of 
metrics that expose the information about the topology of the cluster. If the 
topology changes too frequently or has a size that is different from what you 
expect, you may want to look into whether there are network problems.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| TotalServerNodes| long  |The number of server nodes in the cluster.| Global
+| TotalClientNodes| long |The number of client nodes in the cluster. | Global
+| TotalBaselineNodes | long | The number of nodes that are registered in the 
link:clustering/baseline-topology[baseline topology]. When a node goes down, it 
remains registered in the baseline topology and you need to remote it manually. 
|  Global
+| ActiveBaselineNodes | long | The number of nodes that are currently active 
in the baseline topology.  |  Global
+|===
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| Coordinator | String | The node ID of the current coordinator node.| Global
+| CoordinatorNodeFormatted|String a|
+Detailed information about the coordinator node.
+....
+TcpDiscoveryNode [id=e07ad289-ff5b-4a73-b3d4-d323a661b6d4,
+consistentId=fa65ff2b-e7e2-4367-96d9-fd0915529c25,
+addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.25.4.200],
+sockAddrs=[mymachine.local/172.25.4.200:47500,
+/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500,
+order=2, intOrder=2, lastExchangeTime=1568187777249, loc=false,
+ver=8.7.5#20190520-sha1:d159cd7a, isClient=false]
+....
+
+| Global
+|===
+
+=== Monitoring Caches
+
+See the new metric system, link:monitoring-metrics/new-metrics#caches[Cache 
metrics].
+
+==== Monitoring Build and Rebuild Indexes
+
+To get an estimate on how long it takes to rebuild cache indexes, you can use 
one of the metrics listed below:
+
+. `IsIndexRebuildInProgress` - tells whether indexes are being built or 
rebuilt at the moment;
+. `IndexBuildCountPartitionsLeft` - gives the remaining number of partitions 
(by cache group) for indexes to rebuild.
+
+Note that the `IndexBuildCountPartitionsLeft` metric allows to estimate only 
an approximate number of indexes left to rebuild.
+For a more accurate estimate, use the `IndexRebuildKeyProcessed` cache metric:
+
+* Use `isIndexRebuildInProgress` to know whether the indexes are being rebuilt 
for the cache.
+
+* Use `IndexRebuildKeysProcessed` to know the number of keys with rebuilt 
indexes. If the rebuilding is in progress, it gives a number of keys with 
indexes being rebuilt at the current moment. Otherwise, it gives a total number 
of the of keys with rebuilt indexes. The values are reset before the start of 
each rebuilding.
+
+=== Monitoring Transactions
+Note that if a transaction spans multiple nodes (i.e., if the keys that are 
changed as a result of the transaction execution are located on multiple 
nodes), the counters will increase on each node. For example, the 
'TransactionsCommittedNumber' counter will increase on each node where the keys 
affected by the transaction are stored.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| LockedKeysNumber | long  | The number of keys locked on the node. | Node
+| TransactionsCommittedNumber |long | The number of transactions that have 
been committed on the node  | Node
+| TransactionsRolledBackNumber | long | The number of transactions that were 
rolled back. | Node
+| OwnerTransactionsNumber | long |  The number of transactions initiated on 
the node. | Node
+| TransactionsHoldingLockNumber | long | The number of open transactions that 
hold a lock on at least one key on the node.| Node
+|===
+
+=== Monitoring Snapshots
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| LastSnapshotOperation |  | |
+| LastSnapshotStartTime || |
+| SnapshotInProgress | | |
+|===
+
+=== Monitoring Client Connections
+Metrics related to JDBC/ODBC or thin client connections.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| Connections | java.util.List<String> a| A list of strings, each string 
containing information about a connection:
+
+....
+JdbcClient [id=4294967297, user=<anonymous>,
+rmtAddr=127.0.0.1:39264, locAddr=127.0.0.1:10800]
+....
+| Node
+|===
+
+
+=== Monitoring Message Queues
+When thread pools queues' are growing, it means that the node cannot keep up 
with the load, or there was an error while processing messages in the queue.
+Continuous growth of the queue size can lead to OOM errors.
+
+==== Communication Message Queue
+The queue of outgoing communication messages contains communication messages 
that are waiting to be sent to other nodes.
+If the size is growing, it means there is a problem.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| OutboundMessagesQueueSize  | int | The size of the queue of outgoing 
communication messages. | Node
+|===
+
+==== Discovery Messages Queue
+
+The queue of discovery messages.
+
+[{table_opts}]
+|===
+| Attribute | Type | Description | Scope
+| MessageWorkerQueueSize | int | The size of the queue of discovery messages 
that are waiting to be sent to other nodes. | Node
+|AvgMessageProcessingTime|long| Average message processing time. | Node
+|===
+--

[ignite] branch master updated: IGNITE-20070 Ignite Docs: Remove docs related to DataRegionMetrics and so on (#10865)

Reply via email to