This is an automated email from the ASF dual-hosted git repository.

etudenhoefner pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git


The following commit(s) were added to refs/heads/master by this push:
     new 0850cb9e99 Docs: Add docs for Metrics Reporting (#8345)
0850cb9e99 is described below

commit 0850cb9e998bc5ddc2f121a851cff4ea289c6596
Author: Eduard Tudenhoefner <[email protected]>
AuthorDate: Mon Aug 21 09:57:01 2023 +0200

    Docs: Add docs for Metrics Reporting (#8345)
---
 docs/configuration.md     |   1 +
 docs/metrics-reporting.md | 174 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 175 insertions(+)

diff --git a/docs/configuration.md b/docs/configuration.md
index 7fa2d94adf..7c568e7e9a 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -139,6 +139,7 @@ Iceberg catalogs support using catalog properties to 
configure catalog behaviors
 | clients                           | 2                  | client pool size    
                                   |
 | cache-enabled                     | true               | Whether to cache 
catalog entries |
 | cache.expiration-interval-ms      | 30000              | How long catalog 
entries are locally cached, in milliseconds; 0 disables caching, negative 
values disable expiration |
+| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | 
Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics 
reporting](metrics-reporting) section for additional details |
 
 `HadoopCatalog` and `HiveCatalog` can access the properties in their 
constructors.
 Any other custom catalog can access the properties by implementing 
`Catalog.initialize(catalogName, catalogProperties)`.
diff --git a/docs/metrics-reporting.md b/docs/metrics-reporting.md
new file mode 100644
index 0000000000..5b87bfe5e1
--- /dev/null
+++ b/docs/metrics-reporting.md
@@ -0,0 +1,174 @@
+---
+title: "Metrics Reporting"
+url: metrics-reporting
+aliases:
+    - "tables/metrics-reporting"
+menu:
+    main:
+        parent: Tables
+        identifier: metrics_reporting
+        weight: 0
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Metrics Reporting
+
+As of 1.1.0 Iceberg supports the [`MetricsReporter`](../../../javadoc/{{% 
icebergVersion %}}/org/apache/iceberg/metrics/MetricsReporter.html) and the 
[`MetricsReport`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/MetricsReport.html) APIs. These two APIs allow 
expressing different metrics reports while supporting a pluggable way of 
reporting these reports.
+
+## Type of Reports
+
+### ScanReport
+A [`ScanReport`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/ScanReport.html) carries metrics being collected 
during scan planning against a given table. Amongst some general information 
about the involved table, such as the snapshot id or the table name, it 
includes metrics like:
+* total scan planning duration
+* number of data/delete files included in the result
+* number of data/delete manifests scanned/skipped
+* number of data/delete files scanned/skipped
+* number of equality/positional delete files scanned
+
+
+### CommitReport
+A [`CommitReport`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/CommitReport.html) carries metrics being 
collected after committing changes to a table (aka producing a snapshot). 
Amongst some general information about the involved table, such as the snapshot 
id or the table name, it includes metrics like:
+* total duration
+* number of attempts required for the commit to succeed
+* number of added/removed data/delete files
+* number of added/removed equality/positional delete files
+* number of added/removed equality/positional deletes
+
+
+## Available Metrics Reporters
+
+### [`LoggingMetricsReporter`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/LoggingMetricsReporter.html)
+
+This is the default metrics reporter when nothing else is configured and its 
purpose is to log results to the log file. Example output would look as shown 
below:
+
+```
+INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics 
report: 
+ScanReport{
+    tableName=scan-planning-with-eq-and-pos-delete-files, 
+    snapshotId=2, 
+    filter=ref(name="data") == "(hash-27fa7cc0)", 
+    schemaId=0, 
+    projectedFieldIds=[1, 2], 
+    projectedFieldNames=[id, data], 
+    scanMetrics=ScanMetricsResult{
+        totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, 
totalDuration=PT0.026569404S, count=1}, 
+        resultDataFiles=CounterResult{unit=COUNT, value=1}, 
+        resultDeleteFiles=CounterResult{unit=COUNT, value=2}, 
+        totalDataManifests=CounterResult{unit=COUNT, value=1}, 
+        totalDeleteManifests=CounterResult{unit=COUNT, value=1}, 
+        scannedDataManifests=CounterResult{unit=COUNT, value=1}, 
+        skippedDataManifests=CounterResult{unit=COUNT, value=0}, 
+        totalFileSizeInBytes=CounterResult{unit=BYTES, value=10}, 
+        totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=20}, 
+        skippedDataFiles=CounterResult{unit=COUNT, value=0}, 
+        skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, 
+        scannedDeleteManifests=CounterResult{unit=COUNT, value=1}, 
+        skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, 
+        indexedDeleteFiles=CounterResult{unit=COUNT, value=2}, 
+        equalityDeleteFiles=CounterResult{unit=COUNT, value=1}, 
+        positionalDeleteFiles=CounterResult{unit=COUNT, value=1}}, 
+    metadata={
+        iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit 
4868d2823004c8c256a50ea7c25cff94314cc135)}}
+```
+
+```
+INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics 
report: 
+CommitReport{
+    tableName=scan-planning-with-eq-and-pos-delete-files, 
+    snapshotId=1, 
+    sequenceNumber=1, 
+    operation=append, 
+    commitMetrics=CommitMetricsResult{
+        totalDuration=TimerResult{timeUnit=NANOSECONDS, 
totalDuration=PT0.098429626S, count=1}, 
+        attempts=CounterResult{unit=COUNT, value=1}, 
+        addedDataFiles=CounterResult{unit=COUNT, value=1}, 
+        removedDataFiles=null, 
+        totalDataFiles=CounterResult{unit=COUNT, value=1}, 
+        addedDeleteFiles=null, 
+        addedEqualityDeleteFiles=null, 
+        addedPositionalDeleteFiles=null, 
+        removedDeleteFiles=null, 
+        removedEqualityDeleteFiles=null, 
+        removedPositionalDeleteFiles=null, 
+        totalDeleteFiles=CounterResult{unit=COUNT, value=0}, 
+        addedRecords=CounterResult{unit=COUNT, value=1}, 
+        removedRecords=null, 
+        totalRecords=CounterResult{unit=COUNT, value=1}, 
+        addedFilesSizeInBytes=CounterResult{unit=BYTES, value=10}, 
+        removedFilesSizeInBytes=null, 
+        totalFilesSizeInBytes=CounterResult{unit=BYTES, value=10}, 
+        addedPositionalDeletes=null, 
+        removedPositionalDeletes=null, 
+        totalPositionalDeletes=CounterResult{unit=COUNT, value=0}, 
+        addedEqualityDeletes=null, 
+        removedEqualityDeletes=null, 
+        totalEqualityDeletes=CounterResult{unit=COUNT, value=0}}, 
+    metadata={
+        iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit 
4868d2823004c8c256a50ea7c25cff94314cc135)}}
+```
+
+
+### [`RESTMetricsReporter`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/rest/RESTMetricsReporter.html)
+
+This is the default when using the [`RESTCatalog`](../../../javadoc/{{% 
icebergVersion %}}/org/apache/iceberg/rest/RESTCatalog.html) and its purpose is 
to send metrics to a REST server at the 
`/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics` endpoint as 
defined in the [REST OpenAPI 
spec](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml).
+
+Sending metrics via REST can be controlled with the 
`rest-metrics-reporting-enabled` (defaults to `true`) property.
+
+
+## Implementing a custom Metrics Reporter
+
+Implementing the [`MetricsReporter`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/MetricsReporter.html) API gives full flexibility 
in dealing with incoming [`MetricsReport`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/MetricsReport.html) instances. For example, it 
would be possible to send results to a Prometheus endpoint or any other 
observability framework/system.
+
+Below is a short example illustrating an `InMemoryMetricsReporter` that stores 
reports in a list and makes them available:
+```java
+public class InMemoryMetricsReporter implements MetricsReporter {
+
+  private List<MetricsReport> metricsReports = Lists.newArrayList();
+
+  @Override
+  public void report(MetricsReport report) {
+    metricsReports.add(report);
+  }
+
+  public List<MetricsReport> reports() {
+    return metricsReports;
+  }
+}
+```
+
+## Registering a custom Metrics Reporter
+
+### Via Catalog Configuration
+
+The [catalog property](../configuration#catalog-properties) 
`metrics-reporter-impl` allows registering a given 
[`MetricsReporter`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/MetricsReporter.html) by specifying its 
fully-qualified class name, e.g. 
`metrics-reporter-impl=org.apache.iceberg.metrics.InMemoryMetricsReporter`.
+
+### Via the Java API during Scan planning
+
+Independently of the [`MetricsReporter`](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/metrics/MetricsReporter.html) being registered at the 
catalog level via the `metrics-reporter-impl` property, it is also possible to 
supply additional reporters during scan planning as shown below:
+
+```java
+TableScan tableScan = 
+    table
+        .newScan()
+        .metricsReporter(customReporterOne)
+        .metricsReporter(customReporterTwo);
+
+try (CloseableIterable<FileScanTask> fileScanTasks = tableScan.planFiles()) {
+  // ...
+}
+```
\ No newline at end of file

Reply via email to