[PR] feat(mcp): PR8 - add security layer, type safety, and test infrastructure [superset]

via GitHub Fri, 24 Oct 2025 17:23:00 -0700


aminghadersohi opened a new pull request, #35841:
URL: https://github.com/apache/superset/pull/35841


   ## SUMMARY
   
   This PR adds critical security and testing infrastructure to complete the 
production-readiness of the MCP service. It introduces field-level permissions 
filtering, type-safe DAO protocols, and comprehensive test infrastructure for 
authentication and server startup validation.
   
   **New Features Added:**
   
   ### Security Layer (313 lines)
   
   **permissions_utils.py** - Field-level permissions and data filtering:
   - Defines sensitive fields by object type (dataset, chart, dashboard)
   - Protects SQL queries, connection strings, cache keys, internal IDs
   - Role-based access control for sensitive field access
   - Permission-aware serialization with automatic filtering
   - Security functions:
     - `get_sensitive_fields(object_type)` - Get sensitive field names
     - `has_permission_for_field(user, field, object_type)` - Check field 
permissions
     - `filter_sensitive_fields(data, user, object_type)` - Auto-filter based 
on permissions
     - `filter_sensitive_fields_pydantic(model, user, object_type)` - Pydantic 
model filtering
   
   **Protected fields:**
   - **Dataset**: sql, extra, database_id, changed_by_fk, created_by_fk
   - **Chart**: query_context, cache_key, changed_by_fk, created_by_fk
   - **Dashboard**: json_metadata, position_json, css, changed_by_fk, 
created_by_fk
   
   **Permission requirements:**
   - Admin users: Can access all sensitive fields
   - Users with specific permissions: Can access field if they have the 
required permission
   - Regular users: Sensitive fields automatically filtered out
   
   ### Type Safety (59 lines)
   
   **dao/base.py** - DAO Protocol for consistent data access:
   - Protocol definition ensuring type safety across all model tools
   - Standard interface for Data Access Objects
   - Required methods:
     - `list()` - Paginated listing with filters, search, and ordering
     - `find_by_id()` - Retrieve single record by ID
     - `get_filterable_columns_and_operators()` - Get available filters
     - `count()` - Total record count (for instance info)
   - Benefits:
     - Type checking at development time
     - Consistent interfaces across chart/dashboard/dataset DAOs
     - Better IDE support and autocomplete
     - Prevents interface drift
   
   ### Test Infrastructure (626 lines)
   
   **conftest.py (129 lines)** - Shared test fixtures and mocks:
   - Auto-mocking fixtures for Superset dependencies
   - Mock Flask app with security manager
   - Mock user fixtures (admin, regular user, anonymous)
   - Mock DAOs (ChartDAO, DashboardDAO, DatasetDAO, DatabaseDAO)
   - Mock command classes for all tool operations
   - Reusable test utilities across all MCP tests
   
   **test_auth.py (317 lines)** - Authentication unit tests:
   - JWT token validation and parsing
   - User resolution from tokens
   - User impersonation support
   - Permission checking
   - Audit logging validation
   - Payload sanitization
   - Error handling for invalid tokens
   - Auth provider configuration
   - Test coverage:
     - Valid token authentication
     - Invalid token handling
     - Expired token handling
     - Malformed token handling
     - User impersonation flows
     - Permission-based access control
   
   **test_auth_integration.py (147 lines)** - JWT integration tests:
   - End-to-end JWT authentication workflows
   - FastMCP integration with auth layer
   - Real token generation and validation
   - User context propagation through tool calls
   - Permission enforcement in tool execution
   - Test scenarios:
     - Tool calls with valid authentication
     - Unauthenticated requests (rejected)
     - Permission-based tool access
     - User impersonation workflows
   
   **test_server_startup.py (33 lines)** - Server startup verification:
   - MCP server initialization tests
   - Configuration validation
   - Service health checks
   - Error handling during startup
   
   **Statistics:**
   - **Files Added**: 7 (2 production, 1 test config, 4 test files)
   - **Lines Added**: ~998
     - permissions_utils.py: 313 lines (security layer)
     - dao/base.py: 59 lines (type safety)
     - conftest.py: 129 lines (test fixtures)
     - test_auth.py: 317 lines (auth unit tests)
     - test_auth_integration.py: 147 lines (auth integration tests)
     - test_server_startup.py: 33 lines (server startup tests)
   - **Test Coverage**: Authentication, authorization, server startup, 
permission filtering
   - **All pre-commit hooks passing**
   
   **Builds on:**
   - PR #35163 (MCP service scaffold)
   - PR2 (chart listing and info tools)
   - PR3 (dashboard and dataset listing and info tools)
   - PR4 (advanced chart tools and SQL Lab integration)
   - PR5 (dashboard generation and chart placement tools)
   - PR6 (instance info tool, prompts, and resources)
   - PR7 (restore tool documentation)
   
   ## BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   N/A - This is a backend security and testing infrastructure enhancement.
   
   ## TESTING INSTRUCTIONS
   
   ### Prerequisites
   1. Ensure you have a running Superset instance with the MCP service enabled
   2. Ensure you have the development environment set up (Python venv activated)
   3. Have pytest installed in your environment
   
   ### Running the Test Suite
   
   ```bash
   # Activate your virtual environment
   source venv/bin/activate  # or your venv path
   
   # Run all MCP service tests
   pytest tests/unit_tests/mcp_service/ -v
   
   # Run specific test files
   pytest tests/unit_tests/mcp_service/system/test_auth.py -v
   pytest tests/unit_tests/mcp_service/system/test_auth_integration.py -v
   pytest tests/unit_tests/mcp_service/system/test_server_startup.py -v
   
   # Run with coverage
   pytest tests/unit_tests/mcp_service/ --cov=superset.mcp_service 
--cov-report=term-missing
   ```
   
   ### Expected Test Results
   
   **test_auth.py** (317 lines):
   - Should see ~15-20 passing tests for:
     - JWT token validation
     - User resolution
     - Permission checking
     - Audit logging
     - Error handling
   
   **test_auth_integration.py** (147 lines):
   - Should see ~8-10 passing tests for:
     - End-to-end JWT workflows
     - Tool authentication
     - Permission enforcement
   
   **test_server_startup.py** (33 lines):
   - Should see ~2-3 passing tests for:
     - Server initialization
     - Configuration validation
   
   ### Manual Security Testing
   
   Test the permissions filtering manually:
   
   ```python
   # In a Python shell with Superset context
   from superset.mcp_service.utils.permissions_utils import (
       filter_sensitive_fields,
       get_sensitive_fields,
   )
   from superset import security_manager
   
   # Get sensitive fields for datasets
   sensitive = get_sensitive_fields("dataset")
   print(f"Sensitive dataset fields: {sensitive}")
   # Expected: {'sql', 'extra', 'database_id', 'changed_by_fk', 'created_by_fk'}
   
   # Test filtering with a mock user (requires Superset context)
   user = security_manager.find_user(username="admin")
   dataset_data = {
       "id": 1,
       "name": "My Dataset",
       "sql": "SELECT * FROM sensitive_table",  # Should be filtered for 
non-admin
       "table_name": "public_table",
   }
   
   filtered = filter_sensitive_fields(dataset_data, user, "dataset")
   print(filtered)
   # For admin: All fields present
   # For regular user: 'sql' field removed
   ```
   
   ### Pre-Commit Validation
   
   ```bash
   # Stage all files
   git add -A
   
   # Run pre-commit hooks on changed files
   pre-commit run --files \
     superset/mcp_service/utils/permissions_utils.py \
     superset/mcp_service/dao/base.py \
     superset/mcp_service/dao/__init__.py \
     tests/unit_tests/mcp_service/conftest.py \
     tests/unit_tests/mcp_service/system/test_auth.py \
     tests/unit_tests/mcp_service/system/test_auth_integration.py \
     tests/unit_tests/mcp_service/system/test_server_startup.py
   
   # Expected: All hooks pass (mypy, ruff, pylint, etc.)
   ```
   
   ### Verification Checklist
   
   - [ ] All authentication tests pass
   - [ ] All integration tests pass
   - [ ] Server startup tests pass
   - [ ] permissions_utils.py correctly identifies sensitive fields
   - [ ] Field filtering works for different user roles
   - [ ] DAO Protocol type checking works in IDE
   - [ ] conftest.py fixtures are available to other test files
   - [ ] Pre-commit hooks pass
   
   ## ADDITIONAL INFORMATION
   
   - [ ] Has associated issue: No
   - [ ] Required feature flags: No
   - [ ] Changes UI: No
   - [ ] Includes DB Migration: No
   - [x] Introduces new feature or API: Yes (security layer, type safety, test 
infrastructure)
   - [ ] Removes existing feature or API: No
   
   **Completes Production-Ready Security:**
   
   This PR represents a critical security and quality milestone for the MCP 
service:
   
   1. **Field-level Security** - Protects sensitive data from unauthorized 
access
   2. **Type Safety** - Ensures consistent interfaces across all data access
   3. **Test Coverage** - Validates authentication, authorization, and startup
   4. **Production Ready** - With these additions, the service is ready for 
production deployment
   
   **Files Changed Summary:**
   
   **New Security (2 files):**
   - superset/mcp_service/utils/permissions_utils.py (313 lines)
   - superset/mcp_service/dao/base.py (59 lines)
   - superset/mcp_service/dao/__init__.py (22 lines - exports)
   
   **New Test Infrastructure (4 files):**
   - tests/unit_tests/mcp_service/conftest.py (129 lines)
   - tests/unit_tests/mcp_service/system/test_auth.py (317 lines)
   - tests/unit_tests/mcp_service/system/test_auth_integration.py (147 lines)
   - tests/unit_tests/mcp_service/system/test_server_startup.py (33 lines)
   
   **Key Security Features:**
   
   1. **permissions_utils.py**:
      - Centralized sensitive field definitions by object type
      - Permission-aware filtering functions
      - Support for both dict and Pydantic model filtering
      - Role-based access control integration
      - Security by default (deny access unless explicitly permitted)
   
   2. **Field-level permissions**:
      - Protects SQL queries from unauthorized viewing
      - Hides database connection strings and credentials
      - Filters internal IDs and references
      - Removes cache keys and sensitive metadata
      - Preserves public fields for all users
   
   3. **dao/base.py Protocol**:
      - Type-safe interface for all DAOs
      - Consistent method signatures
      - Better IDE support and type checking
      - Prevents interface drift across modules
      - Required for instance info statistics
   
   **Test Infrastructure Highlights:**
   
   1. **conftest.py**:
      - Auto-mocking strategy for all Superset dependencies
      - Prevents accidental database access in unit tests
      - Reusable fixtures across all MCP tests
      - Mock users with different permission levels
      - Mock DAOs for chart/dashboard/dataset operations
   
   2. **Authentication Tests**:
      - Comprehensive JWT validation coverage
      - User resolution and impersonation
      - Permission checking workflows
      - Error handling for all failure scenarios
      - Audit logging validation
   
   3. **Integration Tests**:
      - End-to-end authentication workflows
      - FastMCP integration validation
      - Tool execution with auth context
      - Permission enforcement in real scenarios
   
   **Architecture Benefits:**
   
   The security layer uses a declarative approach:
   
   ```python
   # Define sensitive fields once
   SENSITIVE_FIELDS = {
       "dataset": {"sql", "extra", "database_id", ...},
       "chart": {"query_context", "cache_key", ...},
       "dashboard": {"json_metadata", "position_json", ...},
   }
   
   # Auto-filter based on user permissions
   filtered_data = filter_sensitive_fields(data, user, "dataset")
   ```
   
   This design enables:
   - Centralized security policy management
   - Easy addition of new sensitive fields
   - Consistent filtering across all tools
   - Automatic protection with minimal code changes
   - Clear audit trail of what's protected
   
   **Pattern Compliance:**
   
   This PR continues the **MINIMAL cherry-pick pattern** established in 
previous PRs:
   - ✅ Copied ONLY the required security and test files
   - ✅ NO development utilities or extra infrastructure
   - ✅ Clean separation of concerns (security, type safety, tests)
   - ✅ All pre-commit hooks passing
   - ✅ Production-ready code quality
   
   **Risk Mitigation:**
   
   Without this PR, the MCP service would have:
   - ❌ No field-level security (sensitive data exposed)
   - ❌ No type safety guarantees (potential runtime errors)
   - ❌ No authentication test coverage (security vulnerabilities)
   - ❌ No server startup validation (deployment failures)
   
   With this PR:
   - ✅ Field-level security with role-based filtering
   - ✅ Type-safe DAO interfaces with Protocol
   - ✅ Comprehensive authentication test coverage
   - ✅ Server startup and configuration validation
   
   **Next Steps:**
   
   The MCP service is now feature-complete AND production-ready. Remaining 
tasks:
   - Code review for security layer
   - Final integration testing
   - Deployment planning
   - Documentation updates
   
   Future enhancements may include:
   - Row-level security integration
   - Fine-grained permission policies
   - Additional test coverage for module-specific tools
   - Performance testing under load
   - Security audit and penetration testing
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat(mcp): PR8 - add security layer, type safety, and test infrastructure [superset]

Reply via email to