(arrow-adbc) branch main updated: feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark (#3660)

curth Mon, 03 Nov 2025 09:40:03 -0800

This is an automated email from the ASF dual-hosted git repository.

curth pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-adbc.git



The following commit(s) were added to refs/heads/main by this push:
     new 68a2d61b5 feat(csharp/Benchmarks): Add CloudFetch E2E performance 
benchmark (#3660)
68a2d61b5 is described below

commit 68a2d61b5e44ed364505f01419893fd2c40b06c6
Author: eric-wang-1990 <[email protected]>
AuthorDate: Mon Nov 3 09:39:51 2025 -0800

    feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark (#3660)
    
    ## Summary
    Adds comprehensive E2E benchmark for Databricks CloudFetch to measure
    real-world performance with actual cluster and configurable queries.
    
    ## Changes
    - **CloudFetchRealE2EBenchmark**: Real E2E benchmark against actual
    Databricks cluster
    - Configurable via JSON file (DATABRICKS_TEST_CONFIG_FILE environment
    variable)
    - Power BI consumption simulation with batch-size proportional delays
    (5ms per 10K rows)
      - Peak memory tracking using Process.WorkingSet64
    - Custom peak memory column in results table with console output
    reference
    
    - **CloudFetchBenchmarkRunner**: Standalone runner for CloudFetch
    benchmarks
      - Simplified to only run real E2E benchmark
    - Optimized iteration counts (1 warmup + 3 actual) for faster execution
      - Hides confusing Error/StdDev columns from summary table
    
    - **README.md**: Documentation for running and understanding the
    benchmarks
    
    ## Configuration
    Benchmark requires `DATABRICKS_TEST_CONFIG_FILE` environment variable
    pointing to JSON config:
    ```json
    {
      "uri": 
"https://your-workspace.cloud.databricks.com/sql/1.0/warehouses/xxx";,
      "token": "dapi...",
      "query": "select * from main.tpcds_sf1_delta.catalog_sales"
    }
    ```
    
    ## Run Command
    ```bash
    export DATABRICKS_TEST_CONFIG_FILE=/path/to/config.json
    cd csharp
    dotnet run -c Release --project Benchmarks/Benchmarks.csproj --framework 
net8.0 CloudFetchBenchmarkRunner -- --filter "*"
    ```
    
    ## Example Output
    
    **Console output during benchmark execution:**
    ```
    Loaded config from: /path/to/databricks-config.json
    Hostname: adb-6436897454825492.12.azuredatabricks.net
    HTTP Path: /sql/1.0/warehouses/2f03dd43e35e2aa0
    Query: select * from main.tpcds_sf1_delta.catalog_sales
    Benchmark will test CloudFetch with 5ms per 10K rows read delay
    
    // Warmup
    CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 272.97 MB
    WorkloadWarmup   1: 1 op, 11566591709.00 ns, 11.5666 s/op
    
    // Actual iterations
    CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 249.11 MB
    WorkloadResult   1: 1 op, 8752445353.00 ns, 8.7524 s/op
    
    CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 261.95 MB
    WorkloadResult   2: 1 op, 9794630771.00 ns, 9.7946 s/op
    
    CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 258.39 MB
    WorkloadResult   3: 1 op, 9017280271.00 ns, 9.0173 s/op
    ```
    
    **Summary table:**
    ```
    BenchmarkDotNet v0.15.4, macOS Sequoia 15.7.1 (24G231) [Darwin 24.6.0]
    Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
    .NET SDK 8.0.407
      [Host] : .NET 8.0.19 (8.0.19, 8.0.1925.36514), Arm64 RyuJIT armv8.0-a
    
    | Method            | ReadDelayMs | Mean    | Min     | Max     | Median  | 
Peak Memory (MB)          | Gen0       | Gen1       | Gen2       | Allocated |
    |------------------ |------------ 
|--------:|--------:|--------:|--------:|--------------------------:|-----------:|-----------:|-----------:|----------:|
    | ExecuteLargeQuery | 5           | 9.19 s  | 8.75 s  | 9.79 s  | 9.02 s  | 
See previous console output | 28000.0000 | 28000.0000 | 28000.0000 |   1.78 GB |
    ```
    
    **Key Metrics:**
    - **E2E Time**: 8.75-9.79 seconds (includes query execution, CloudFetch
    downloads, LZ4 decompression, batch consumption)
    - **Peak Memory**: 249-262 MB (tracked via Process.WorkingSet64, printed
    in console)
    - **Total Allocated**: 1.78 GB managed memory
    - **GC Collections**: 28K Gen0/Gen1/Gen2 collections
    
    ## Test Plan
    - [x] Built successfully
    - [x] Verified benchmark runs with real Databricks cluster
    - [x] Confirmed peak memory tracking works
    - [x] Validated Power BI simulation delays are proportional to batch
    size
    - [x] Checked results table formatting
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    ---------
    
    Signed-off-by: Sreekanth Vadigi <[email protected]>
    Co-authored-by: Sreekanth Vadigi <[email protected]>
    Co-authored-by: Jade Wang <[email protected]>
    Co-authored-by: Claude <[email protected]>
---
 csharp/Benchmarks/Benchmarks.csproj                |   7 +
 csharp/Benchmarks/CloudFetchBenchmarkRunner.cs     |  44 ++++
 .../Databricks/CloudFetchRealE2EBenchmark.cs       | 286 +++++++++++++++++++++
 csharp/Benchmarks/Databricks/README.md             | 142 ++++++++++
 4 files changed, 479 insertions(+)

diff --git a/csharp/Benchmarks/Benchmarks.csproj 
b/csharp/Benchmarks/Benchmarks.csproj
index e06731aa1..9a5b7d931 100644
--- a/csharp/Benchmarks/Benchmarks.csproj
+++ b/csharp/Benchmarks/Benchmarks.csproj
@@ -6,6 +6,7 @@
     <ImplicitUsings>enable</ImplicitUsings>
     <Nullable>enable</Nullable>
     
<ProcessArchitecture>$([System.Runtime.InteropServices.RuntimeInformation]::ProcessArchitecture.ToString().ToLowerInvariant())</ProcessArchitecture>
+    
<StartupObject>Apache.Arrow.Adbc.Benchmarks.CloudFetchBenchmarkRunner</StartupObject>
   </PropertyGroup>
 
   <ItemGroup>
@@ -13,9 +14,15 @@
     <PackageReference Include="DuckDB.NET.Bindings.Full" 
GeneratePathProperty="true" />
   </ItemGroup>
 
+  <ItemGroup>
+    <PackageReference Include="K4os.Compression.LZ4" />
+    <PackageReference Include="K4os.Compression.LZ4.Streams" />
+  </ItemGroup>
+
   <ItemGroup>
     <ProjectReference 
Include="..\src\Apache.Arrow.Adbc\Apache.Arrow.Adbc.csproj" />
     <ProjectReference Include="..\src\Client\Apache.Arrow.Adbc.Client.csproj" 
/>
+    <ProjectReference 
Include="..\src\Drivers\Databricks\Apache.Arrow.Adbc.Drivers.Databricks.csproj" 
/>
     <ProjectReference 
Include="..\test\Apache.Arrow.Adbc.Tests\Apache.Arrow.Adbc.Tests.csproj" />
   </ItemGroup>
 
diff --git a/csharp/Benchmarks/CloudFetchBenchmarkRunner.cs 
b/csharp/Benchmarks/CloudFetchBenchmarkRunner.cs
new file mode 100644
index 000000000..13c7ad515
--- /dev/null
+++ b/csharp/Benchmarks/CloudFetchBenchmarkRunner.cs
@@ -0,0 +1,44 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+using Apache.Arrow.Adbc.Benchmarks.Databricks;
+using BenchmarkDotNet.Columns;
+using BenchmarkDotNet.Configs;
+using BenchmarkDotNet.Running;
+
+namespace Apache.Arrow.Adbc.Benchmarks
+{
+    /// <summary>
+    /// Standalone runner for CloudFetch benchmarks only.
+    /// Usage: dotnet run -c Release --framework net8.0 
CloudFetchBenchmarkRunner
+    /// </summary>
+    public class CloudFetchBenchmarkRunner
+    {
+        public static void Main(string[] args)
+        {
+            // Configure to include the peak memory column and hide confusing 
error column
+            var config = DefaultConfig.Instance
+                .AddColumn(new PeakMemoryColumn())
+                .HideColumns("Error", "StdDev");  // Hide statistical columns 
that are confusing with few iterations
+
+            // Run only the real E2E CloudFetch benchmark
+            var summary = BenchmarkSwitcher.FromTypes(new[] {
+                typeof(CloudFetchRealE2EBenchmark)          // Real E2E with 
Databricks (requires credentials)
+            }).Run(args, config);
+        }
+    }
+}
diff --git a/csharp/Benchmarks/Databricks/CloudFetchRealE2EBenchmark.cs 
b/csharp/Benchmarks/Databricks/CloudFetchRealE2EBenchmark.cs
new file mode 100644
index 000000000..5838442b8
--- /dev/null
+++ b/csharp/Benchmarks/Databricks/CloudFetchRealE2EBenchmark.cs
@@ -0,0 +1,286 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+using System;
+using System.Collections.Generic;
+using System.Diagnostics;
+using System.IO;
+using System.Text.Json;
+using System.Threading;
+using System.Threading.Tasks;
+using Apache.Arrow.Adbc.Drivers.Apache.Spark;
+using Apache.Arrow.Adbc.Drivers.Databricks;
+using Apache.Arrow.Ipc;
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Columns;
+using BenchmarkDotNet.Reports;
+using BenchmarkDotNet.Running;
+
+namespace Apache.Arrow.Adbc.Benchmarks.Databricks
+{
+    /// <summary>
+    /// Custom column to display peak memory usage in the benchmark results 
table.
+    /// </summary>
+    public class PeakMemoryColumn : IColumn
+    {
+        public string Id => nameof(PeakMemoryColumn);
+        public string ColumnName => "Peak Memory (MB)";
+        public string Legend => "Peak working set memory during benchmark 
execution";
+        public UnitType UnitType => UnitType.Size;
+        public bool AlwaysShow => true;
+        public ColumnCategory Category => ColumnCategory.Custom;
+        public int PriorityInCategory => 0;
+        public bool IsNumeric => true;
+        public bool IsAvailable(Summary summary) => true;
+        public bool IsDefault(Summary summary, BenchmarkCase benchmarkCase) => 
false;
+
+        public string GetValue(Summary summary, BenchmarkCase benchmarkCase)
+        {
+            // Try CloudFetchRealE2EBenchmark (includes parameters in key)
+            if (benchmarkCase.Descriptor.Type == 
typeof(CloudFetchRealE2EBenchmark))
+            {
+                // Extract ReadDelayMs parameter
+                var readDelayParam = benchmarkCase.Parameters["ReadDelayMs"];
+                string key = $"ExecuteLargeQuery_{readDelayParam}";
+                if 
(CloudFetchRealE2EBenchmark.PeakMemoryResults.TryGetValue(key, out var 
peakMemoryMB))
+                {
+                    return $"{peakMemoryMB:F2}";
+                }
+            }
+
+            return "See previous console output";
+        }
+
+        public string GetValue(Summary summary, BenchmarkCase benchmarkCase, 
SummaryStyle style)
+        {
+            return GetValue(summary, benchmarkCase);
+        }
+
+        public override string ToString() => ColumnName;
+    }
+
+    /// <summary>
+    /// Configuration model for Databricks test configuration JSON file.
+    /// </summary>
+    internal class DatabricksTestConfig
+    {
+        public string? uri { get; set; }
+        public string? token { get; set; }
+        public string? query { get; set; }
+        public string? type { get; set; }
+        public string? catalog { get; set; }
+        public string? schema { get; set; }
+    }
+
+    /// <summary>
+    /// Real E2E performance benchmark for Databricks CloudFetch with actual 
cluster.
+    ///
+    /// Prerequisites:
+    /// - Set DATABRICKS_TEST_CONFIG_FILE environment variable
+    /// - Config file should contain cluster connection details
+    ///
+    /// Run with: dotnet run -c Release --project Benchmarks/Benchmarks.csproj 
--framework net8.0 -- --filter "*CloudFetchRealE2E*" --job dry
+    ///
+    /// Measures:
+    /// - Peak memory usage
+    /// - Total allocations
+    /// - GC collections
+    /// - Query execution time
+    /// - Row processing throughput
+    ///
+    /// Parameters:
+    /// - ReadDelayMs: Fixed at 5 milliseconds per 10K rows to simulate Power 
BI consumption
+    /// </summary>
+    [MemoryDiagnoser]
+    [GcServer(true)]
+    [SimpleJob(warmupCount: 1, iterationCount: 3)]
+    [MinColumn, MaxColumn, MeanColumn, MedianColumn]
+    public class CloudFetchRealE2EBenchmark
+    {
+        // Static dictionary to store peak memory results for the custom column
+        public static readonly Dictionary<string, double> PeakMemoryResults = 
new Dictionary<string, double>();
+
+        private AdbcConnection? _connection;
+        private Process _currentProcess = null!;
+        private long _peakMemoryBytes;
+        private DatabricksTestConfig _testConfig = null!;
+        private string _hostname = null!;
+        private string _httpPath = null!;
+
+        [Params(5)] // Read delay in milliseconds per 10K rows (5 = simulate 
Power BI)
+        public int ReadDelayMs { get; set; }
+
+        [GlobalSetup]
+        public void GlobalSetup()
+        {
+            // Check if Databricks config is available
+            string? configFile = 
Environment.GetEnvironmentVariable("DATABRICKS_TEST_CONFIG_FILE");
+            if (string.IsNullOrEmpty(configFile))
+            {
+                throw new InvalidOperationException(
+                    "DATABRICKS_TEST_CONFIG_FILE environment variable must be 
set. " +
+                    "Set it to the path of your Databricks test configuration 
JSON file.");
+            }
+
+            // Read and parse config file
+            string configJson = File.ReadAllText(configFile);
+            _testConfig = 
JsonSerializer.Deserialize<DatabricksTestConfig>(configJson)
+                ?? throw new InvalidOperationException("Failed to parse config 
file");
+
+            if (string.IsNullOrEmpty(_testConfig.uri) || 
string.IsNullOrEmpty(_testConfig.token))
+            {
+                throw new InvalidOperationException("Config file must contain 
'uri' and 'token' fields");
+            }
+
+            if (string.IsNullOrEmpty(_testConfig.query))
+            {
+                throw new InvalidOperationException("Config file must contain 
'query' field");
+            }
+
+            // Parse URI to extract hostname and http_path
+            // Format: https://hostname/sql/1.0/warehouses/xxx
+            var uri = new Uri(_testConfig.uri);
+            _hostname = uri.Host;
+            _httpPath = uri.PathAndQuery;
+
+            _currentProcess = Process.GetCurrentProcess();
+            Console.WriteLine($"Loaded config from: {configFile}");
+            Console.WriteLine($"Hostname: {_hostname}");
+            Console.WriteLine($"HTTP Path: {_httpPath}");
+            Console.WriteLine($"Query: {_testConfig.query}");
+            Console.WriteLine($"Benchmark will test CloudFetch with 
{ReadDelayMs}ms per 10K rows read delay");
+        }
+
+        [IterationSetup]
+        public void IterationSetup()
+        {
+            // Create connection for this iteration using config values
+            var parameters = new Dictionary<string, string>
+            {
+                [AdbcOptions.Uri] = _testConfig.uri!,
+                [SparkParameters.Token] = _testConfig.token!,
+                [DatabricksParameters.UseCloudFetch] = "true",
+                [DatabricksParameters.EnableDirectResults] = "true",
+                [DatabricksParameters.CanDecompressLz4] = "true",
+                [DatabricksParameters.MaxBytesPerFile] = "10485760", // 10MB 
per file
+            };
+
+            var driver = new DatabricksDriver();
+            var database = driver.Open(parameters);
+            _connection = database.Connect(parameters);
+
+            // Reset peak memory tracking
+            GC.Collect(2, GCCollectionMode.Forced, blocking: true, compacting: 
false);
+            GC.WaitForPendingFinalizers();
+            GC.Collect(2, GCCollectionMode.Forced, blocking: true, compacting: 
false);
+            _currentProcess.Refresh();
+            _peakMemoryBytes = _currentProcess.WorkingSet64;
+        }
+
+        [IterationCleanup]
+        public void IterationCleanup()
+        {
+            _connection?.Dispose();
+            _connection = null;
+
+            // Print and store peak memory for this iteration
+            double peakMemoryMB = _peakMemoryBytes / 1024.0 / 1024.0;
+            Console.WriteLine($"CloudFetch E2E [Delay={ReadDelayMs}ms/10K 
rows] - Peak memory: {peakMemoryMB:F2} MB");
+
+            // Store in static dictionary for the custom column (key includes 
parameter)
+            string key = $"ExecuteLargeQuery_{ReadDelayMs}";
+            PeakMemoryResults[key] = peakMemoryMB;
+        }
+
+        /// <summary>
+        /// Execute a large query against Databricks and consume all result 
batches.
+        /// Simulates client behavior like Power BI reading data.
+        /// Uses the query from the config file.
+        /// </summary>
+        [Benchmark]
+        public async Task<long> ExecuteLargeQuery()
+        {
+            if (_connection == null)
+            {
+                throw new InvalidOperationException("Connection not 
initialized");
+            }
+
+            // Execute query from config file
+            var statement = _connection.CreateStatement();
+            statement.SqlQuery = _testConfig.query;
+
+            var result = await statement.ExecuteQueryAsync();
+            if (result.Stream == null)
+            {
+                throw new InvalidOperationException("Result stream is null");
+            }
+
+            // Read all batches and track peak memory
+            long totalRows = 0;
+            long totalBatches = 0;
+            RecordBatch? batch;
+
+            while ((batch = await result.Stream.ReadNextRecordBatchAsync()) != 
null)
+            {
+                totalRows += batch.Length;
+                totalBatches++;
+
+                // Track peak memory periodically
+                if (totalBatches % 10 == 0)
+                {
+                    TrackPeakMemory();
+                }
+
+                // Simulate Power BI processing delay if configured
+                // Delay is proportional to batch size: ReadDelayMs per 10K 
rows
+                if (ReadDelayMs > 0)
+                {
+                    int delayForBatch = (int)((batch.Length / 10000.0) * 
ReadDelayMs);
+                    if (delayForBatch > 0)
+                    {
+                        Thread.Sleep(delayForBatch);
+                    }
+                }
+
+                batch.Dispose();
+            }
+
+            // Final peak memory check
+            TrackPeakMemory();
+
+            statement.Dispose();
+            return totalRows;
+        }
+
+        private void TrackPeakMemory()
+        {
+            _currentProcess.Refresh();
+            long currentMemory = _currentProcess.WorkingSet64;
+            if (currentMemory > _peakMemoryBytes)
+            {
+                _peakMemoryBytes = currentMemory;
+            }
+        }
+
+        [GlobalCleanup]
+        public void GlobalCleanup()
+        {
+            GC.Collect(2, GCCollectionMode.Forced, blocking: true, compacting: 
true);
+            GC.WaitForPendingFinalizers();
+        }
+    }
+}
diff --git a/csharp/Benchmarks/Databricks/README.md 
b/csharp/Benchmarks/Databricks/README.md
new file mode 100644
index 000000000..e117e2be4
--- /dev/null
+++ b/csharp/Benchmarks/Databricks/README.md
@@ -0,0 +1,142 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Databricks CloudFetch E2E Benchmark
+
+Real end-to-end benchmark for measuring memory usage and performance of the 
Databricks CloudFetch implementation against an actual Databricks cluster.
+
+## Overview
+
+This benchmark tests the complete CloudFetch flow with real queries against a 
Databricks warehouse:
+- Full end-to-end CloudFetch flow (query execution, downloads, LZ4 
decompression, batch consumption)
+- Real data from Databricks tables
+- Memory usage with actual network I/O
+- Power BI consumption simulation with batch-proportional delays
+
+## Benchmark
+
+### CloudFetchRealE2EBenchmark
+
+**Real end-to-end benchmark against actual Databricks cluster:**
+
+**Parameters:**
+- `ReadDelayMs`: Fixed at 5ms per 10K rows to simulate Power BI processing 
delays
+
+**Method:**
+- `ExecuteLargeQuery`: Executes the query specified in the config file, reads 
all batches with Power BI-like processing delays
+
+**Prerequisites:**
+- Set `DATABRICKS_TEST_CONFIG_FILE` environment variable pointing to your 
config JSON
+- Config file must contain:
+  - `uri`: Full Databricks warehouse URI (e.g., 
`https://hostname/sql/1.0/warehouses/xxx`)
+  - `token`: Databricks access token
+  - `query`: SQL query to execute (this will be run by the benchmark)
+
+## Running the Benchmark
+
+### Run the CloudFetch E2E benchmark:
+```bash
+cd csharp
+export DATABRICKS_TEST_CONFIG_FILE=/path/to/databricks-config.json
+dotnet run -c Release --project Benchmarks/Benchmarks.csproj --framework 
net8.0 -- --filter "*CloudFetchRealE2E*"
+```
+
+### Real E2E Benchmark Configuration
+
+Create a JSON config file with your Databricks cluster details:
+
+```json
+{
+  "uri": "https://your-workspace.cloud.databricks.com/sql/1.0/warehouses/xxx";,
+  "token": "dapi...",
+  "query": "select * from main.tpcds_sf1_delta.catalog_sales",
+  "type": "databricks"
+}
+```
+
+Then set the environment variable:
+```bash
+export DATABRICKS_TEST_CONFIG_FILE=/path/to/databricks-config.json
+```
+
+**Note**: The `query` field specifies the SQL query that will be executed 
during the benchmark. Use a query that returns a large result set to properly 
test CloudFetch performance.
+
+## Understanding the Results
+
+### Key Metrics:
+
+- **Peak Memory (MB)**: Maximum working set memory during execution
+  - Printed to console output during each benchmark iteration
+  - Shows the real memory footprint during CloudFetch operations
+
+- **Allocated**: Total managed memory allocated during the operation
+  - Lower is better for memory efficiency
+
+- **Gen0/Gen1/Gen2**: Number of garbage collections
+  - Gen0: Frequent, low cost (short-lived objects)
+  - Gen1/Gen2: Less frequent, higher cost (longer-lived objects)
+  - LOH: Part of Gen2, objects >85KB
+
+- **Mean/Median**: Execution time statistics
+  - Shows the end-to-end time including query execution, CloudFetch downloads, 
LZ4 decompression, and batch consumption
+
+### Example Output
+
+**Console output during benchmark execution:**
+```
+Loaded config from: /path/to/databricks-config.json
+Hostname: adb-6436897454825492.12.azuredatabricks.net
+HTTP Path: /sql/1.0/warehouses/2f03dd43e35e2aa0
+Query: select * from main.tpcds_sf1_delta.catalog_sales
+Benchmark will test CloudFetch with 5ms per 10K rows read delay
+
+// Warmup
+CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 272.97 MB
+WorkloadWarmup   1: 1 op, 11566591709.00 ns, 11.5666 s/op
+
+// Actual iterations
+CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 249.11 MB
+WorkloadResult   1: 1 op, 8752445353.00 ns, 8.7524 s/op
+
+CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 261.95 MB
+WorkloadResult   2: 1 op, 9794630771.00 ns, 9.7946 s/op
+
+CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 258.39 MB
+WorkloadResult   3: 1 op, 9017280271.00 ns, 9.0173 s/op
+```
+
+**Summary table:**
+```
+BenchmarkDotNet v0.15.4, macOS Sequoia 15.7.1 (24G231) [Darwin 24.6.0]
+Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
+.NET SDK 8.0.407
+  [Host] : .NET 8.0.19 (8.0.19, 8.0.1925.36514), Arm64 RyuJIT armv8.0-a
+
+| Method            | ReadDelayMs | Mean    | Min     | Max     | Median  | 
Peak Memory (MB)          | Gen0       | Gen1       | Gen2       | Allocated |
+|------------------ |------------ 
|--------:|--------:|--------:|--------:|--------------------------:|-----------:|-----------:|-----------:|----------:|
+| ExecuteLargeQuery | 5           | 9.19 s  | 8.75 s  | 9.79 s  | 9.02 s  | 
See previous console output | 28000.0000 | 28000.0000 | 28000.0000 |   1.78 GB |
+```
+
+**Key Metrics:**
+- **E2E Time**: 8.75-9.79 seconds (includes query execution, CloudFetch 
downloads, LZ4 decompression, batch consumption)
+- **Peak Memory**: 249-262 MB (tracked via Process.WorkingSet64, printed in 
console)
+- **Total Allocated**: 1.78 GB managed memory
+- **GC Collections**: 28K Gen0/Gen1/Gen2 collections
+
+**Note**: Peak memory values are printed to console during execution since 
BenchmarkDotNet runs each iteration in a separate process.

(arrow-adbc) branch main updated: feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark (#3660)

Reply via email to