[PR] feat: Add Apache Spark integration to OpenServerless [openserverless]

via GitHub Sat, 22 Nov 2025 08:04:05 -0800


mobs75 opened a new pull request, #188:
URL: https://github.com/apache/openserverless/pull/188


   ## Summary
   
   This PR adds comprehensive Apache Spark integration to OpenServerless, 
enabling users to deploy and manage Spark clusters alongside their serverless 
workloads for big data processing capabilities.
   
   ## Architecture
   
   The integration is implemented in the **operator submodule** and follows 
OpenServerless patterns:
   - Spark deployment managed by Kubernetes operator
   - Seamless integration with existing OpenServerless components (MinIO, 
PostgreSQL, MongoDB, Redis)
   - Declarative configuration through Whisk CRD
   
   ## Key Features
   
   ### Spark Components
   - ✅ **Spark Master**: Standalone cluster manager with configurable resources
   - ✅ **Spark History Server**: Web UI for completed applications with 
S3-compatible storage
   - ✅ **Spark Workers**: (foundation ready for dynamic scaling)
   
   ### Technical Implementation
   - **Resource Management**: Proper memory format handling (Kubernetes `1Gi` ↔ 
JVM `1g`)
   - **Service Discovery**: Automatic DNS configuration for inter-component 
communication
   - **Storage Integration**: MinIO S3-compatible storage for Spark event logs
   - **RBAC**: Least-privilege security with proper ServiceAccount and Role 
bindings
   - **Health Checks**: Comprehensive readiness/liveness probes
   - **Lifecycle Management**: Owner references for automatic cleanup
   
   ## Changes
   
   ### Operator Submodule (commit `afc74b4`)
   - **New Module**: `nuvolaris/spark.py` - Complete Spark operator 
implementation
   - **Templates**: Kubernetes manifests for RBAC, ConfigMaps, Services, 
StatefulSets
   - **Integration**: Hooks into main operator workflow (`patcher.py`, 
`main.py`)
   
   ### Configuration
   
   ```yaml
   apiVersion: nuvolaris.org/v1
   kind: Whisk
   metadata:
     name: controller
   spec:
     components:
       spark: true
     spark:
       enabled: true
       mode: standalone
       image: apache/spark:3.5.0
       master:
         memory: 1Gi
         cpu: 1000m
       history:
         enabled: true
         backend: s3a
         s3a:
           bucket: spark-history
           endpoint: http://minio.nuvolaris.svc.cluster.local:9000
           secretRef: nuvolaris-minio
   ```
   
   ## Testing
   
   Tested on **MicroK8s** cluster:
   - ✅ Spark Master deployment and healthy startup
   - ✅ History Server with MinIO integration
   - ✅ Resource limits properly applied
   - ✅ Service endpoints accessible (`spark://spark-master:7077`)
   - ✅ Web UI available on port 8080
   
   ### Verification
   ```bash
   kubectl -n nuvolaris get pods -l app=spark
   NAME                            READY   STATUS    RESTARTS   AGE
   spark-history-7b7d97c7d         1/1     Running   0          10m
   spark-master-0                  1/1     Running   0          10m
   ```
   
   ## Use Cases
   
   - **Data Processing**: Run Spark jobs within OpenServerless environment
   - **ETL Pipelines**: Process large datasets stored in MinIO
   - **Machine Learning**: Train models using Spark MLlib
   - **Analytics**: Query and analyze data alongside serverless functions
   
   ## Future Enhancements
   
   - [ ] Dynamic Spark Worker scaling
   - [ ] Spark application submission via operator API
   - [ ] Metrics integration with Prometheus
   - [ ] Support for Spark on Kubernetes mode
   - [ ] Jupyter notebook integration
   
   ## Documentation
   
   User documentation and examples to be added in follow-up PRs.
   
   ---
   
   **Related Issues**: Closes #[issue-number]
   
   **Operator Submodule PR**: mobs75/openserverless-operator#[pr-number]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat: Add Apache Spark integration to OpenServerless [openserverless]

Reply via email to