This is an automated email from the ASF dual-hosted git repository.
etudenhoefner pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 0850cb9e99 Docs: Add docs for Metrics Reporting (#8345)
0850cb9e99 is described below
commit 0850cb9e998bc5ddc2f121a851cff4ea289c6596
Author: Eduard Tudenhoefner <[email protected]>
AuthorDate: Mon Aug 21 09:57:01 2023 +0200
Docs: Add docs for Metrics Reporting (#8345)
---
docs/configuration.md | 1 +
docs/metrics-reporting.md | 174 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 175 insertions(+)
diff --git a/docs/configuration.md b/docs/configuration.md
index 7fa2d94adf..7c568e7e9a 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -139,6 +139,7 @@ Iceberg catalogs support using catalog properties to
configure catalog behaviors
| clients | 2 | client pool size
|
| cache-enabled | true | Whether to cache
catalog entries |
| cache.expiration-interval-ms | 30000 | How long catalog
entries are locally cached, in milliseconds; 0 disables caching, negative
values disable expiration |
+| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter |
Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics
reporting](metrics-reporting) section for additional details |
`HadoopCatalog` and `HiveCatalog` can access the properties in their
constructors.
Any other custom catalog can access the properties by implementing
`Catalog.initialize(catalogName, catalogProperties)`.
diff --git a/docs/metrics-reporting.md b/docs/metrics-reporting.md
new file mode 100644
index 0000000000..5b87bfe5e1
--- /dev/null
+++ b/docs/metrics-reporting.md
@@ -0,0 +1,174 @@
+---
+title: "Metrics Reporting"
+url: metrics-reporting
+aliases:
+ - "tables/metrics-reporting"
+menu:
+ main:
+ parent: Tables
+ identifier: metrics_reporting
+ weight: 0
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Metrics Reporting
+
+As of 1.1.0 Iceberg supports the [`MetricsReporter`](../../../javadoc/{{%
icebergVersion %}}/org/apache/iceberg/metrics/MetricsReporter.html) and the
[`MetricsReport`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/MetricsReport.html) APIs. These two APIs allow
expressing different metrics reports while supporting a pluggable way of
reporting these reports.
+
+## Type of Reports
+
+### ScanReport
+A [`ScanReport`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/ScanReport.html) carries metrics being collected
during scan planning against a given table. Amongst some general information
about the involved table, such as the snapshot id or the table name, it
includes metrics like:
+* total scan planning duration
+* number of data/delete files included in the result
+* number of data/delete manifests scanned/skipped
+* number of data/delete files scanned/skipped
+* number of equality/positional delete files scanned
+
+
+### CommitReport
+A [`CommitReport`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/CommitReport.html) carries metrics being
collected after committing changes to a table (aka producing a snapshot).
Amongst some general information about the involved table, such as the snapshot
id or the table name, it includes metrics like:
+* total duration
+* number of attempts required for the commit to succeed
+* number of added/removed data/delete files
+* number of added/removed equality/positional delete files
+* number of added/removed equality/positional deletes
+
+
+## Available Metrics Reporters
+
+### [`LoggingMetricsReporter`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/LoggingMetricsReporter.html)
+
+This is the default metrics reporter when nothing else is configured and its
purpose is to log results to the log file. Example output would look as shown
below:
+
+```
+INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics
report:
+ScanReport{
+ tableName=scan-planning-with-eq-and-pos-delete-files,
+ snapshotId=2,
+ filter=ref(name="data") == "(hash-27fa7cc0)",
+ schemaId=0,
+ projectedFieldIds=[1, 2],
+ projectedFieldNames=[id, data],
+ scanMetrics=ScanMetricsResult{
+ totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS,
totalDuration=PT0.026569404S, count=1},
+ resultDataFiles=CounterResult{unit=COUNT, value=1},
+ resultDeleteFiles=CounterResult{unit=COUNT, value=2},
+ totalDataManifests=CounterResult{unit=COUNT, value=1},
+ totalDeleteManifests=CounterResult{unit=COUNT, value=1},
+ scannedDataManifests=CounterResult{unit=COUNT, value=1},
+ skippedDataManifests=CounterResult{unit=COUNT, value=0},
+ totalFileSizeInBytes=CounterResult{unit=BYTES, value=10},
+ totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=20},
+ skippedDataFiles=CounterResult{unit=COUNT, value=0},
+ skippedDeleteFiles=CounterResult{unit=COUNT, value=0},
+ scannedDeleteManifests=CounterResult{unit=COUNT, value=1},
+ skippedDeleteManifests=CounterResult{unit=COUNT, value=0},
+ indexedDeleteFiles=CounterResult{unit=COUNT, value=2},
+ equalityDeleteFiles=CounterResult{unit=COUNT, value=1},
+ positionalDeleteFiles=CounterResult{unit=COUNT, value=1}},
+ metadata={
+ iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit
4868d2823004c8c256a50ea7c25cff94314cc135)}}
+```
+
+```
+INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics
report:
+CommitReport{
+ tableName=scan-planning-with-eq-and-pos-delete-files,
+ snapshotId=1,
+ sequenceNumber=1,
+ operation=append,
+ commitMetrics=CommitMetricsResult{
+ totalDuration=TimerResult{timeUnit=NANOSECONDS,
totalDuration=PT0.098429626S, count=1},
+ attempts=CounterResult{unit=COUNT, value=1},
+ addedDataFiles=CounterResult{unit=COUNT, value=1},
+ removedDataFiles=null,
+ totalDataFiles=CounterResult{unit=COUNT, value=1},
+ addedDeleteFiles=null,
+ addedEqualityDeleteFiles=null,
+ addedPositionalDeleteFiles=null,
+ removedDeleteFiles=null,
+ removedEqualityDeleteFiles=null,
+ removedPositionalDeleteFiles=null,
+ totalDeleteFiles=CounterResult{unit=COUNT, value=0},
+ addedRecords=CounterResult{unit=COUNT, value=1},
+ removedRecords=null,
+ totalRecords=CounterResult{unit=COUNT, value=1},
+ addedFilesSizeInBytes=CounterResult{unit=BYTES, value=10},
+ removedFilesSizeInBytes=null,
+ totalFilesSizeInBytes=CounterResult{unit=BYTES, value=10},
+ addedPositionalDeletes=null,
+ removedPositionalDeletes=null,
+ totalPositionalDeletes=CounterResult{unit=COUNT, value=0},
+ addedEqualityDeletes=null,
+ removedEqualityDeletes=null,
+ totalEqualityDeletes=CounterResult{unit=COUNT, value=0}},
+ metadata={
+ iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit
4868d2823004c8c256a50ea7c25cff94314cc135)}}
+```
+
+
+### [`RESTMetricsReporter`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/rest/RESTMetricsReporter.html)
+
+This is the default when using the [`RESTCatalog`](../../../javadoc/{{%
icebergVersion %}}/org/apache/iceberg/rest/RESTCatalog.html) and its purpose is
to send metrics to a REST server at the
`/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics` endpoint as
defined in the [REST OpenAPI
spec](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml).
+
+Sending metrics via REST can be controlled with the
`rest-metrics-reporting-enabled` (defaults to `true`) property.
+
+
+## Implementing a custom Metrics Reporter
+
+Implementing the [`MetricsReporter`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/MetricsReporter.html) API gives full flexibility
in dealing with incoming [`MetricsReport`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/MetricsReport.html) instances. For example, it
would be possible to send results to a Prometheus endpoint or any other
observability framework/system.
+
+Below is a short example illustrating an `InMemoryMetricsReporter` that stores
reports in a list and makes them available:
+```java
+public class InMemoryMetricsReporter implements MetricsReporter {
+
+ private List<MetricsReport> metricsReports = Lists.newArrayList();
+
+ @Override
+ public void report(MetricsReport report) {
+ metricsReports.add(report);
+ }
+
+ public List<MetricsReport> reports() {
+ return metricsReports;
+ }
+}
+```
+
+## Registering a custom Metrics Reporter
+
+### Via Catalog Configuration
+
+The [catalog property](../configuration#catalog-properties)
`metrics-reporter-impl` allows registering a given
[`MetricsReporter`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/MetricsReporter.html) by specifying its
fully-qualified class name, e.g.
`metrics-reporter-impl=org.apache.iceberg.metrics.InMemoryMetricsReporter`.
+
+### Via the Java API during Scan planning
+
+Independently of the [`MetricsReporter`](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/metrics/MetricsReporter.html) being registered at the
catalog level via the `metrics-reporter-impl` property, it is also possible to
supply additional reporters during scan planning as shown below:
+
+```java
+TableScan tableScan =
+ table
+ .newScan()
+ .metricsReporter(customReporterOne)
+ .metricsReporter(customReporterTwo);
+
+try (CloseableIterable<FileScanTask> fileScanTasks = tableScan.planFiles()) {
+ // ...
+}
+```
\ No newline at end of file