eric-wang-1990 commented on code in PR #3636:
URL: https://github.com/apache/arrow-adbc/pull/3636#discussion_r2467541107
##########
csharp/src/Drivers/Databricks/Telemetry/telemetry-activity-based-design.md:
##########
@@ -0,0 +1,918 @@
+# Databricks ADBC Driver: Activity-Based Telemetry Design
+
+## Executive Summary
+
+This document outlines an **Activity-based telemetry design** that leverages
the existing Activity/ActivitySource infrastructure in the Databricks ADBC
driver. Instead of creating a parallel telemetry system, we extend the current
tracing infrastructure to collect metrics and export them to Databricks
telemetry service.
+
+**Key Objectives:**
+- Reuse existing Activity instrumentation points
+- Add metrics collection without duplicating code
+- Export aggregated metrics to Databricks service
+- Maintain server-side feature flag control
+- Preserve backward compatibility with OpenTelemetry
+
+**Design Principles:**
+- **Build on existing infrastructure**: Leverage
ActivitySource/ActivityListener
+- **Single instrumentation point**: Don't duplicate tracing and metrics
+- **Non-blocking**: All operations async and non-blocking
+- **Privacy-first**: No PII or query data collected
+- **Server-controlled**: Feature flag support for enable/disable
+
+---
+
+## Table of Contents
+
+1. [Background & Motivation](#1-background--motivation)
+2. [Architecture Overview](#2-architecture-overview)
+3. [Core Components](#3-core-components)
+4. [Data Collection](#4-data-collection)
+5. [Export Mechanism](#5-export-mechanism)
+6. [Configuration](#6-configuration)
+7. [Privacy & Compliance](#7-privacy--compliance)
+8. [Error Handling](#8-error-handling)
+9. [Testing Strategy](#9-testing-strategy)
+10. [Migration & Rollout](#10-migration--rollout)
+11. [Alternatives Considered](#11-alternatives-considered)
+
+---
+
+## 1. Background & Motivation
+
+### 1.1 Current State
+
+The Databricks ADBC driver already has:
+- ✅ **ActivitySource**: `DatabricksAdbcActivitySource`
+- ✅ **Activity instrumentation**: Connection, statement execution, result
fetching
+- ✅ **W3C Trace Context**: Distributed tracing support
+- ✅ **ActivityTrace utility**: Helper for creating activities
+
+### 1.2 Design Opportunity
+
+The driver already has comprehensive Activity instrumentation for distributed
tracing. This presents an opportunity to:
+- Leverage existing Activity infrastructure for both tracing and metrics
+- Avoid duplicate instrumentation points in the driver code
+- Use a single data model (Activity) for both observability concerns
+- Maintain automatic correlation between traces and metrics
+- Reduce overall system complexity and maintenance burden
+
+### 1.3 The Approach
+
+**Extend Activity infrastructure** with metrics collection:
+- ✅ Single instrumentation point (Activity)
+- ✅ Custom ActivityListener for metrics aggregation
+- ✅ Export aggregated data to Databricks service
+- ✅ Reuse Activity context, correlation, and timing
+- ✅ Seamless integration with OpenTelemetry ecosystem
+
+---
+
+## 2. Architecture Overview
+
+### 2.1 High-Level Architecture
+
+```mermaid
+graph TB
+ A[Driver Operations] -->|Activity.Start/Stop| B[ActivitySource]
+ B -->|Activity Events| C[DatabricksActivityListener]
+ C -->|Aggregate Metrics| D[MetricsAggregator]
+ D -->|Batch & Buffer| E[DatabricksTelemetryExporter]
+ E -->|HTTP POST| F[Databricks Service]
+ F --> G[Lumberjack]
+
+ H[Feature Flag Service] -.->|Enable/Disable| C
+
+ style C fill:#e1f5fe
+ style D fill:#e1f5fe
+ style E fill:#e1f5fe
+```
+
+**Key Components:**
+1. **ActivitySource** (existing): Emits activities for all operations
+2. **DatabricksActivityListener** (new): Listens to activities, extracts
metrics
+3. **MetricsAggregator** (new): Aggregates by statement, batches events
+4. **DatabricksTelemetryExporter** (new): Exports to Databricks service
+
+### 2.2 Activity Flow
+
+```mermaid
+sequenceDiagram
+ participant App as Application
+ participant Driver as DatabricksConnection
+ participant AS as ActivitySource
+ participant AL as ActivityListener
+ participant MA as MetricsAggregator
+ participant Ex as TelemetryExporter
+ participant Service as Databricks Service
+
+ App->>Driver: ExecuteQueryAsync()
+ Driver->>AS: StartActivity("ExecuteQuery")
+ AS->>AL: ActivityStarted(activity)
+
+ Driver->>Driver: Execute operation
+ Driver->>AS: activity.SetTag("result_format", "CloudFetch")
+ Driver->>AS: activity.AddEvent("ChunkDownload", tags)
+
+ AS->>AL: ActivityStopped(activity)
+ AL->>MA: ProcessActivity(activity)
+ MA->>MA: Aggregate by statement_id
+
+ alt Batch threshold reached
+ MA->>Ex: Flush(batch)
+ Ex->>Service: POST /telemetry-ext
+ end
+```
+
+---
+
+## 3. Core Components
+
+### 3.1 DatabricksActivityListener
+
+**Purpose**: Listen to Activity events and extract metrics for Databricks
telemetry.
+
+**Location**:
`Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.DatabricksActivityListener`
+
+#### Interface
+
+```csharp
+namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry
+{
+ /// <summary>
+ /// Custom ActivityListener that aggregates metrics from Activity events
+ /// and exports them to Databricks telemetry service.
+ /// </summary>
+ public sealed class DatabricksActivityListener : IDisposable
+ {
+ public DatabricksActivityListener(
+ DatabricksConnection connection,
+ ITelemetryExporter exporter,
+ TelemetryConfiguration config);
+
+ // Start listening to activities
+ public void Start();
+
+ // Stop listening and flush pending metrics
+ public Task StopAsync();
+
+ public void Dispose();
+ }
+}
+```
+
+#### Activity Listener Configuration
+
+```csharp
+// Internal setup
+private ActivityListener CreateListener()
+{
+ return new ActivityListener
+ {
+ ShouldListenTo = source =>
+ source.Name == "Databricks.Adbc.Driver",
+
+ ActivityStarted = OnActivityStarted,
+ ActivityStopped = OnActivityStopped,
+
+ Sample = (ref ActivityCreationOptions<ActivityContext> options) =>
+ _config.Enabled ? ActivitySamplingResult.AllDataAndRecorded
+ : ActivitySamplingResult.None
+ };
+}
+```
+
+#### Contracts
+
+**Activity Filtering**:
+- Only listen to `"Databricks.Adbc.Driver"` ActivitySource
+- Respects feature flag via `Sample` callback
+
+**Metric Extraction**:
+- Extract metrics from Activity tags
+- Aggregate by `statement_id` tag
+- Aggregate by `session_id` tag
+
+**Non-Blocking**:
+- All processing async
+- Never blocks Activity completion
+- Failures logged but don't propagate
+
+---
+
+### 3.2 MetricsAggregator
+
+**Purpose**: Aggregate Activity data into metrics suitable for Databricks
telemetry.
+
+**Location**:
`Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.MetricsAggregator`
+
+#### Interface
+
+```csharp
+namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry
+{
+ /// <summary>
+ /// Aggregates metrics from activities by statement and session.
+ /// </summary>
+ internal sealed class MetricsAggregator : IDisposable
+ {
+ public MetricsAggregator(
+ ITelemetryExporter exporter,
+ TelemetryConfiguration config);
+
+ // Process completed activity
+ public void ProcessActivity(Activity activity);
+
+ // Mark statement complete and emit aggregated metrics
+ public void CompleteStatement(string statementId);
+
+ // Flush all pending metrics
+ public Task FlushAsync(CancellationToken ct = default);
+
+ public void Dispose();
+ }
+}
+```
+
+#### Aggregation Logic
+
+```mermaid
+flowchart TD
+ A[Activity Stopped] --> B{Activity.OperationName}
+ B -->|Connection.Open| C[Emit Connection Event]
+ B -->|Statement.Execute| D[Aggregate by statement_id]
+ B -->|CloudFetch.Download| D
+ B -->|Statement.Complete| E[Emit Statement Event]
+
+ D --> F{Batch Size Reached?}
+ F -->|Yes| G[Flush to Exporter]
+ F -->|No| H[Continue Buffering]
+```
+
+#### Contracts
+
+**Statement Aggregation**:
+- Activities with same `statement_id` tag aggregated together
+- Aggregation includes: execution latency, chunk downloads, poll count
+- Emitted when statement marked complete
+
+**Connection-Level Events**:
+- Connection.Open emitted immediately
+- Driver configuration collected once per connection
+
+**Error Handling**:
+- Activity errors (tags with `error.type`) captured
+- Never throws exceptions
+
+---
+
+### 3.3 DatabricksTelemetryExporter
+
+**Purpose**: Export aggregated metrics to Databricks telemetry service.
+
+**Location**:
`Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.DatabricksTelemetryExporter`
+
+#### Interface
+
+```csharp
+namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry
+{
+ public interface ITelemetryExporter
+ {
+ /// <summary>
+ /// Export metrics to Databricks service. Never throws.
+ /// </summary>
+ Task ExportAsync(
+ IReadOnlyList<TelemetryMetric> metrics,
+ CancellationToken ct = default);
+ }
+
+ internal sealed class DatabricksTelemetryExporter : ITelemetryExporter
+ {
+ public DatabricksTelemetryExporter(
+ HttpClient httpClient,
+ DatabricksConnection connection,
+ TelemetryConfiguration config);
+
+ public Task ExportAsync(
+ IReadOnlyList<TelemetryMetric> metrics,
+ CancellationToken ct = default);
+ }
+}
+```
+
+**Same implementation as original design**: Circuit breaker, retry logic,
endpoints.
+
+---
+
+## 4. Data Collection
+
+### 4.1 Activity Tags for Metrics
+
+**Standard Activity Tags** (already exist):
+- `operation.name`: e.g., "Connection.Open", "Statement.Execute"
+- `db.operation`: SQL operation type
+- `db.statement`: Statement ID (not query text)
+- `server.address`: Databricks workspace host
+
+**New Tags for Metrics** (add to existing activities):
+- `result.format`: "inline" | "cloudfetch"
Review Comment:
Currently all tags are free formed as text, we should create another
folder/file for all the event and tags, just like how our internal tagging
system is implemented.
An event can be associated with multiple tags.
Tags can be annotated as export__local and/or export_databricks.
Then the filter logic will just refer to this static tag definition file.
And in the future any new tag should be added to this file.
##########
csharp/src/Drivers/Databricks/Telemetry/telemetry-activity-based-design.md:
##########
@@ -0,0 +1,918 @@
+# Databricks ADBC Driver: Activity-Based Telemetry Design
+
+## Executive Summary
+
+This document outlines an **Activity-based telemetry design** that leverages
the existing Activity/ActivitySource infrastructure in the Databricks ADBC
driver. Instead of creating a parallel telemetry system, we extend the current
tracing infrastructure to collect metrics and export them to Databricks
telemetry service.
+
+**Key Objectives:**
+- Reuse existing Activity instrumentation points
+- Add metrics collection without duplicating code
+- Export aggregated metrics to Databricks service
+- Maintain server-side feature flag control
+- Preserve backward compatibility with OpenTelemetry
+
+**Design Principles:**
+- **Build on existing infrastructure**: Leverage
ActivitySource/ActivityListener
+- **Single instrumentation point**: Don't duplicate tracing and metrics
+- **Non-blocking**: All operations async and non-blocking
+- **Privacy-first**: No PII or query data collected
+- **Server-controlled**: Feature flag support for enable/disable
+
+---
+
+## Table of Contents
+
+1. [Background & Motivation](#1-background--motivation)
+2. [Architecture Overview](#2-architecture-overview)
+3. [Core Components](#3-core-components)
+4. [Data Collection](#4-data-collection)
+5. [Export Mechanism](#5-export-mechanism)
+6. [Configuration](#6-configuration)
+7. [Privacy & Compliance](#7-privacy--compliance)
+8. [Error Handling](#8-error-handling)
+9. [Testing Strategy](#9-testing-strategy)
+10. [Migration & Rollout](#10-migration--rollout)
+11. [Alternatives Considered](#11-alternatives-considered)
+
+---
+
+## 1. Background & Motivation
+
+### 1.1 Current State
+
+The Databricks ADBC driver already has:
+- ✅ **ActivitySource**: `DatabricksAdbcActivitySource`
+- ✅ **Activity instrumentation**: Connection, statement execution, result
fetching
+- ✅ **W3C Trace Context**: Distributed tracing support
+- ✅ **ActivityTrace utility**: Helper for creating activities
+
+### 1.2 Design Opportunity
+
+The driver already has comprehensive Activity instrumentation for distributed
tracing. This presents an opportunity to:
+- Leverage existing Activity infrastructure for both tracing and metrics
+- Avoid duplicate instrumentation points in the driver code
+- Use a single data model (Activity) for both observability concerns
+- Maintain automatic correlation between traces and metrics
+- Reduce overall system complexity and maintenance burden
+
+### 1.3 The Approach
+
+**Extend Activity infrastructure** with metrics collection:
+- ✅ Single instrumentation point (Activity)
+- ✅ Custom ActivityListener for metrics aggregation
+- ✅ Export aggregated data to Databricks service
+- ✅ Reuse Activity context, correlation, and timing
+- ✅ Seamless integration with OpenTelemetry ecosystem
+
+---
+
+## 2. Architecture Overview
+
+### 2.1 High-Level Architecture
+
+```mermaid
+graph TB
+ A[Driver Operations] -->|Activity.Start/Stop| B[ActivitySource]
+ B -->|Activity Events| C[DatabricksActivityListener]
+ C -->|Aggregate Metrics| D[MetricsAggregator]
+ D -->|Batch & Buffer| E[DatabricksTelemetryExporter]
+ E -->|HTTP POST| F[Databricks Service]
+ F --> G[Lumberjack]
+
+ H[Feature Flag Service] -.->|Enable/Disable| C
+
+ style C fill:#e1f5fe
+ style D fill:#e1f5fe
+ style E fill:#e1f5fe
+```
+
+**Key Components:**
+1. **ActivitySource** (existing): Emits activities for all operations
+2. **DatabricksActivityListener** (new): Listens to activities, extracts
metrics
+3. **MetricsAggregator** (new): Aggregates by statement, batches events
+4. **DatabricksTelemetryExporter** (new): Exports to Databricks service
+
+### 2.2 Activity Flow
+
+```mermaid
+sequenceDiagram
+ participant App as Application
+ participant Driver as DatabricksConnection
+ participant AS as ActivitySource
+ participant AL as ActivityListener
+ participant MA as MetricsAggregator
+ participant Ex as TelemetryExporter
+ participant Service as Databricks Service
+
+ App->>Driver: ExecuteQueryAsync()
+ Driver->>AS: StartActivity("ExecuteQuery")
+ AS->>AL: ActivityStarted(activity)
+
+ Driver->>Driver: Execute operation
+ Driver->>AS: activity.SetTag("result_format", "CloudFetch")
+ Driver->>AS: activity.AddEvent("ChunkDownload", tags)
+
+ AS->>AL: ActivityStopped(activity)
+ AL->>MA: ProcessActivity(activity)
+ MA->>MA: Aggregate by statement_id
+
+ alt Batch threshold reached
+ MA->>Ex: Flush(batch)
+ Ex->>Service: POST /telemetry-ext
+ end
+```
+
+---
+
+## 3. Core Components
+
+### 3.1 DatabricksActivityListener
+
+**Purpose**: Listen to Activity events and extract metrics for Databricks
telemetry.
+
+**Location**:
`Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.DatabricksActivityListener`
+
+#### Interface
+
+```csharp
+namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry
+{
+ /// <summary>
+ /// Custom ActivityListener that aggregates metrics from Activity events
+ /// and exports them to Databricks telemetry service.
+ /// </summary>
+ public sealed class DatabricksActivityListener : IDisposable
+ {
+ public DatabricksActivityListener(
+ DatabricksConnection connection,
+ ITelemetryExporter exporter,
+ TelemetryConfiguration config);
+
+ // Start listening to activities
+ public void Start();
+
+ // Stop listening and flush pending metrics
+ public Task StopAsync();
+
+ public void Dispose();
+ }
+}
+```
+
+#### Activity Listener Configuration
+
+```csharp
+// Internal setup
+private ActivityListener CreateListener()
+{
+ return new ActivityListener
+ {
+ ShouldListenTo = source =>
+ source.Name == "Databricks.Adbc.Driver",
+
+ ActivityStarted = OnActivityStarted,
+ ActivityStopped = OnActivityStopped,
+
+ Sample = (ref ActivityCreationOptions<ActivityContext> options) =>
+ _config.Enabled ? ActivitySamplingResult.AllDataAndRecorded
+ : ActivitySamplingResult.None
+ };
+}
+```
+
+#### Contracts
+
+**Activity Filtering**:
+- Only listen to `"Databricks.Adbc.Driver"` ActivitySource
+- Respects feature flag via `Sample` callback
+
+**Metric Extraction**:
+- Extract metrics from Activity tags
+- Aggregate by `statement_id` tag
+- Aggregate by `session_id` tag
+
+**Non-Blocking**:
+- All processing async
+- Never blocks Activity completion
+- Failures logged but don't propagate
+
+---
+
+### 3.2 MetricsAggregator
+
+**Purpose**: Aggregate Activity data into metrics suitable for Databricks
telemetry.
+
+**Location**:
`Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.MetricsAggregator`
+
+#### Interface
+
+```csharp
+namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry
+{
+ /// <summary>
+ /// Aggregates metrics from activities by statement and session.
+ /// </summary>
+ internal sealed class MetricsAggregator : IDisposable
+ {
+ public MetricsAggregator(
+ ITelemetryExporter exporter,
+ TelemetryConfiguration config);
+
+ // Process completed activity
+ public void ProcessActivity(Activity activity);
+
+ // Mark statement complete and emit aggregated metrics
+ public void CompleteStatement(string statementId);
+
+ // Flush all pending metrics
+ public Task FlushAsync(CancellationToken ct = default);
+
+ public void Dispose();
+ }
+}
+```
+
+#### Aggregation Logic
+
+```mermaid
+flowchart TD
+ A[Activity Stopped] --> B{Activity.OperationName}
+ B -->|Connection.Open| C[Emit Connection Event]
+ B -->|Statement.Execute| D[Aggregate by statement_id]
+ B -->|CloudFetch.Download| D
+ B -->|Statement.Complete| E[Emit Statement Event]
+
+ D --> F{Batch Size Reached?}
+ F -->|Yes| G[Flush to Exporter]
+ F -->|No| H[Continue Buffering]
+```
+
+#### Contracts
+
+**Statement Aggregation**:
+- Activities with same `statement_id` tag aggregated together
+- Aggregation includes: execution latency, chunk downloads, poll count
+- Emitted when statement marked complete
+
+**Connection-Level Events**:
+- Connection.Open emitted immediately
+- Driver configuration collected once per connection
+
+**Error Handling**:
+- Activity errors (tags with `error.type`) captured
+- Never throws exceptions
+
+---
+
+### 3.3 DatabricksTelemetryExporter
+
+**Purpose**: Export aggregated metrics to Databricks telemetry service.
+
+**Location**:
`Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.DatabricksTelemetryExporter`
+
+#### Interface
+
+```csharp
+namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry
+{
+ public interface ITelemetryExporter
+ {
+ /// <summary>
+ /// Export metrics to Databricks service. Never throws.
+ /// </summary>
+ Task ExportAsync(
+ IReadOnlyList<TelemetryMetric> metrics,
+ CancellationToken ct = default);
+ }
+
+ internal sealed class DatabricksTelemetryExporter : ITelemetryExporter
+ {
+ public DatabricksTelemetryExporter(
+ HttpClient httpClient,
+ DatabricksConnection connection,
+ TelemetryConfiguration config);
+
+ public Task ExportAsync(
+ IReadOnlyList<TelemetryMetric> metrics,
+ CancellationToken ct = default);
+ }
+}
+```
+
+**Same implementation as original design**: Circuit breaker, retry logic,
endpoints.
+
+---
+
+## 4. Data Collection
+
+### 4.1 Activity Tags for Metrics
+
+**Standard Activity Tags** (already exist):
+- `operation.name`: e.g., "Connection.Open", "Statement.Execute"
Review Comment:
This may need some tweak to map to the Databricks expected data model of
MetricType(Connection, Statement, etc).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]