msrathore-db commented on code in PR #3637:
URL: https://github.com/apache/arrow-adbc/pull/3637#discussion_r2471247272


##########
csharp/src/Drivers/Databricks/Reader/CloudFetch/straggler-mitigation-integration-v2.md:
##########
@@ -0,0 +1,607 @@
+# Straggler Download Mitigation - Integration Guide
+
+## Overview
+
+This document provides integration guidance for straggler download mitigation 
in the ADBC CloudFetch system. It focuses on **class contracts, interfaces, and 
interaction patterns** rather than implementation details.
+
+**Design Principle:** Minimal changes to existing architecture - integrate 
seamlessly with CloudFetchDownloader's existing retry mechanism.
+
+---
+
+## 1. Architecture Overview
+
+### 1.1 Component Diagram
+
+```mermaid
+classDiagram
+    class ICloudFetchDownloader {
+        <<interface>>
+        +StartAsync(CancellationToken) Task
+        +StopAsync() Task
+        +GetNextDownloadedFileAsync(CancellationToken) Task~IDownloadResult~
+    }
+
+    class CloudFetchDownloader {
+        -ITracingStatement _statement
+        -SemaphoreSlim _downloadSemaphore
+        -int _maxRetries
+        -StragglerDownloadDetector _stragglerDetector
+        -ConcurrentDictionary~long,FileDownloadMetrics~ _activeDownloadMetrics
+        -ConcurrentDictionary~long,CancellationTokenSource~ _perFileTokens
+        +StartAsync(CancellationToken) Task
+        -DownloadFileAsync(IDownloadResult, CancellationToken) Task
+        -MonitorForStragglerDownloadsAsync(CancellationToken) Task
+    }
+
+    class FileDownloadMetrics {
+        +long FileOffset
+        +long FileSizeBytes
+        +DateTime DownloadStartTime
+        +DateTime? DownloadEndTime
+        +bool IsDownloadCompleted
+        +bool WasCancelledAsStragler
+        +CalculateThroughputBytesPerSecond() double?
+        +MarkDownloadCompleted() void
+        +MarkCancelledAsStragler() void
+    }
+
+    class StragglerDownloadDetector {
+        -double _stragglerThroughputMultiplier
+        -double _minimumCompletionQuantile
+        -TimeSpan _stragglerDetectionPadding
+        -int _maxStragglersBeforeFallback
+        -int _totalStragglersDetectedInQuery
+        +bool ShouldFallbackToSequentialDownloads
+        +IdentifyStragglerDownloads(IReadOnlyList~FileDownloadMetrics~, 
DateTime) IEnumerable~long~
+        +GetTotalStragglersDetectedInQuery() int
+    }
+
+    class IActivityTracer {
+        <<interface>>
+        +ActivityTrace Trace
+        +string? TraceParent
+    }
+
+    ICloudFetchDownloader <|.. CloudFetchDownloader
+    IActivityTracer <|.. CloudFetchDownloader
+    CloudFetchDownloader --> FileDownloadMetrics : tracks
+    CloudFetchDownloader --> StragglerDownloadDetector : uses
+```
+
+### 1.2 Key Integration Points
+
+| Component | Change Type | Description |
+|-----------|-------------|-------------|
+| **DatabricksParameters** | New constants | Add 6 configuration parameters |
+| **CloudFetchDownloader** | Modified | Add straggler tracking and monitoring |
+| **FileDownloadMetrics** | New class | Track per-file download performance |
+| **StragglerDownloadDetector** | New class | Identify stragglers using median 
throughput |
+
+---
+
+## 2. Class Contracts
+
+### 2.1 FileDownloadMetrics
+
+**Purpose:** Track timing and throughput for individual file downloads.
+
+**Public Contract:**
+```csharp
+internal class FileDownloadMetrics
+{
+    // Read-only properties
+    public long FileOffset { get; }
+    public long FileSizeBytes { get; }
+    public DateTime DownloadStartTime { get; }
+    public DateTime? DownloadEndTime { get; }
+    public bool IsDownloadCompleted { get; }
+    public bool WasCancelledAsStragler { get; }
+
+    // Constructor
+    public FileDownloadMetrics(long fileOffset, long fileSizeBytes);
+
+    // Methods
+    public double? CalculateThroughputBytesPerSecond();
+    public void MarkDownloadCompleted();
+    public void MarkCancelledAsStragler();
+}
+```
+
+**Behavior:**
+- Captures start time on construction
+- Calculates throughput as `fileSize / elapsedSeconds`
+- Immutable file metadata (offset, size)
+- State transitions: In Progress → Completed OR Cancelled
+
+---
+
+### 2.2 StragglerDownloadDetector
+
+**Purpose:** Encapsulate straggler identification logic.
+
+**Public Contract:**
+```csharp
+internal class StragglerDownloadDetector
+{
+    // Read-only property
+    public bool ShouldFallbackToSequentialDownloads { get; }
+
+    // Constructor
+    public StragglerDownloadDetector(
+        double stragglerThroughputMultiplier,
+        double minimumCompletionQuantile,
+        TimeSpan stragglerDetectionPadding,
+        int maxStragglersBeforeFallback);
+
+    // Core detection method
+    public IEnumerable<long> IdentifyStragglerDownloads(
+        IReadOnlyList<FileDownloadMetrics> allDownloadMetrics,
+        DateTime currentTime);
+
+    // Query metrics
+    public int GetTotalStragglersDetectedInQuery();
+}
+```
+
+**Detection Algorithm:**
+```
+1. Wait for minimumCompletionQuantile (e.g., 60%) to complete
+2. Calculate median throughput from completed downloads
+3. For each active download:
+   - Calculate expected time: (multiplier × fileSize / medianThroughput) + 
padding
+   - If elapsed > expected: mark as straggler
+4. Track total stragglers for fallback decision
+```
+
+---
+
+### 2.3 CloudFetchDownloader Modifications
+
+**New Fields:**
+```csharp
+// Straggler mitigation state
+private readonly bool _isStragglerMitigationEnabled;
+private readonly StragglerDownloadDetector? _stragglerDetector;
+private readonly ConcurrentDictionary<long, FileDownloadMetrics>? 
_activeDownloadMetrics;
+private readonly ConcurrentDictionary<long, CancellationTokenSource>? 
_perFileTokens;
+
+// Background monitoring
+private Task? _stragglerMonitoringTask;
+private CancellationTokenSource? _stragglerMonitoringCts;
+
+// Fallback state
+private volatile bool _hasTriggeredSequentialDownloadFallback;
+```
+
+**Modified Methods:**
+- `StartAsync()` - Start background monitoring task
+- `StopAsync()` - Stop and cleanup monitoring task
+- `DownloadFileAsync()` - Integrate straggler cancellation handling into retry 
loop
+
+**New Methods:**
+- `MonitorForStragglerDownloadsAsync()` - Background task checking for 
stragglers every 2s
+- `TriggerSequentialDownloadFallback()` - Reduce parallelism to 1
+
+---
+
+## 3. Interaction Flows
+
+### 3.1 Initialization Sequence
+
+```mermaid
+sequenceDiagram
+    participant CM as CloudFetchDownloadManager
+    participant CD as CloudFetchDownloader
+    participant SD as StragglerDownloadDetector
+    participant MT as MonitoringTask
+
+    CM->>CD: new CloudFetchDownloader(...)
+    CD->>CD: Parse straggler config params
+    alt Mitigation Enabled
+        CD->>SD: new StragglerDownloadDetector(...)
+        CD->>CD: Initialize _activeDownloadMetrics
+        CD->>CD: Initialize _perFileTokens
+    end
+
+    CM->>CD: StartAsync()
+    CD->>CD: Start download task
+    alt Mitigation Enabled
+        CD->>MT: Start MonitorForStragglerDownloadsAsync()
+        activate MT
+        MT->>MT: Loop every 2s
+    end
+```
+
+### 3.2 Download with Straggler Detection

Review Comment:
   I've used the first approach. Updated the doc accordingly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to