eric-wang-1990 opened a new pull request, #3580:
URL: https://github.com/apache/arrow-adbc/pull/3580
## Summary
This PR implements comprehensive Activity-based distributed tracing for the
CloudFetch download pipeline in the Databricks C# driver, enabling real-time
monitoring, structured logging, and improved observability.
### Key Changes:
- Add Activity-based tracing to CloudFetchDownloader with TraceActivityAsync
- Create child Activities per individual file download for real-time
progress visibility
- Replace all Trace.TraceInformation/Error calls with Activity.AddEvent for
structured logging
- Add Activity tags for searchable metadata (offset, URL, file sizes)
- Implement proper Activity context flow through async/await chains
- Update CloudFetchDownloadManager to pass statement for tracing context
- Fix all tests to include statement parameter in CloudFetchDownloader
constructor
### Architecture:
The implementation follows a hierarchical Activity structure:
```
Statement Activity (parent)
├─ DownloadFilesAsync Activity (overall batch)
│ ├─ DownloadFile Activity (file 1) - flushes when complete
│ ├─ DownloadFile Activity (file 2) - flushes when complete
│ └─ ...
└─ ReadNextRecordBatchAsync Activity (reader operations)
```
### Benefits:
- **Real-time progress monitoring**: Events flush immediately as each file
completes (not batched)
- **Better fault tolerance**: Completed downloads are logged before process
crashes
- **Improved debuggability**: Searchable Activity tags enable filtering by
offset, URL, size
- **Granular metrics**: Per-file download times, throughput, compression
ratios visible in logs
- **OpenTelemetry-compatible**: Activities follow
System.Diagnostics.Activity standard
### Events Logged:
- `cloudfetch.download_start` - File download initiated
- `cloudfetch.content_length` - Actual file size from HTTP response
- `cloudfetch.download_retry` - Retry attempt with reason
- `cloudfetch.url_refreshed_before_download` - URL refreshed proactively
- `cloudfetch.url_refreshed_after_auth_error` - URL refreshed after 401/403
- `cloudfetch.decompression_complete` - LZ4 decompression metrics
- `cloudfetch.download_complete` - Download success with throughput
- `cloudfetch.download_failed_all_retries` - Final failure after all retries
- `cloudfetch.download_summary` - Overall batch statistics
## Test Plan
- ✅ All existing CloudFetchDownloader E2E tests pass (7 test methods)
- ✅ Build succeeds with 0 warnings
- Manual testing: Query with CloudFetch enabled and verify Activity events
in logs
- Verified Activity context flows correctly through async/await chains
- Confirmed child Activities flush independently upon completion
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]