databricks): implement Databricks ADBC driver with comprehensive test suite [arrow-adbc]

via GitHub Wed, 18 Jun 2025 17:56:05 -0700


jadewang-db opened a new pull request, #2998:
URL: https://github.com/apache/arrow-adbc/pull/2998


   ## Summary
   
     This PR introduces a new Databricks ADBC driver for Go that provides
     Arrow-native database connectivity to Databricks SQL warehouses. The 
driver is
      built as a wrapper around the `databricks-sql-go` library and implements 
all
     required ADBC interfaces.
   
     ## Changes
   
     ### Core Implementation
     - **Driver Implementation** (`driver.go`): Entry point with version 
tracking
     and configuration options
     - **Database Management** (`database.go`): Connection lifecycle management
     with comprehensive validation
     - **Connection Handling** (`connection.go`): Core connection implementation
     with metadata operations
     - **Statement Execution** (`statement.go`): SQL query execution with Arrow
     result conversion
   
     ### Key Features
     - ✅ **Complete ADBC Interface Compliance**: Implements all required Driver,
     Database, Connection, and Statement interfaces
     - ✅ **Arrow-Native Results**: Converts SQL result sets to Apache Arrow 
format
      for efficient data processing
     - ✅ **Comprehensive Configuration**: Supports all Databricks connection
     options (hostname, HTTP path, tokens, catalogs, schemas, timeouts)
     - ✅ **Metadata Discovery**: Implements catalog, schema, and table 
enumeration
     - ✅ **Type Mapping**: Full SQL-to-Arrow type conversion with proper null
     handling
     - ✅ **Error Handling**: Comprehensive error reporting with ADBC error codes
   
     ### Test Organization
     - **Moved all tests to dedicated `test/` subdirectory** for better
     organization
     - **Updated package structure** to use `databricks_test` package with 
proper
     imports
     - **Comprehensive test coverage** including:
       - Unit tests for driver/database creation and validation
       - End-to-end integration tests with real Databricks connections
       - NYC taxi dataset verification (21,932 rows successfully processed)
       - Practical query tests for common SQL operations
       - ADBC validation test suite integration
   
     ### Performance & Verification
     - **Real Data Testing**: Successfully connects to Databricks and processes 
NYC
      taxi dataset
     - **Performance Metrics**: Achieves 7-12 rows/ms query processing rate
     - **Schema Discovery**: Handles 10+ catalogs, 1,600+ schemas, 900+ tables
     - **Type Safety**: Proper Arrow type mapping for all Databricks SQL types
   
     ### Code Quality
     - ✅ **Pre-commit compliance**: All linting, formatting, and static analysis
     checks pass
     - ✅ **Error handling**: All error return values properly handled (errcheck
     compliant)
     - ✅ **Go formatting**: Consistent code formatting with `gofmt`
     - ✅ **License compliance**: Apache license headers on all files
   
     ## Testing
   
     The driver has been thoroughly tested with:
     - Real Databricks SQL warehouse connections
     - Large dataset processing (21,932 NYC taxi records)
     - All ADBC interface methods
     - Error handling and edge cases
     - Performance and memory usage
   
     All tests pass and demonstrate full functionality for production use.
   
     ## Breaking Changes
   
     None - this is a new driver implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(go/adbc/driver/databricks): implement Databricks ADBC driver with comprehensive test suite [arrow-adbc]

Reply via email to