mobs75 opened a new pull request, #188:
URL: https://github.com/apache/openserverless/pull/188
## Summary
This PR adds comprehensive Apache Spark integration to OpenServerless,
enabling users to deploy and manage Spark clusters alongside their serverless
workloads for big data processing capabilities.
## Architecture
The integration is implemented in the **operator submodule** and follows
OpenServerless patterns:
- Spark deployment managed by Kubernetes operator
- Seamless integration with existing OpenServerless components (MinIO,
PostgreSQL, MongoDB, Redis)
- Declarative configuration through Whisk CRD
## Key Features
### Spark Components
- ✅ **Spark Master**: Standalone cluster manager with configurable resources
- ✅ **Spark History Server**: Web UI for completed applications with
S3-compatible storage
- ✅ **Spark Workers**: (foundation ready for dynamic scaling)
### Technical Implementation
- **Resource Management**: Proper memory format handling (Kubernetes `1Gi` ↔
JVM `1g`)
- **Service Discovery**: Automatic DNS configuration for inter-component
communication
- **Storage Integration**: MinIO S3-compatible storage for Spark event logs
- **RBAC**: Least-privilege security with proper ServiceAccount and Role
bindings
- **Health Checks**: Comprehensive readiness/liveness probes
- **Lifecycle Management**: Owner references for automatic cleanup
## Changes
### Operator Submodule (commit `afc74b4`)
- **New Module**: `nuvolaris/spark.py` - Complete Spark operator
implementation
- **Templates**: Kubernetes manifests for RBAC, ConfigMaps, Services,
StatefulSets
- **Integration**: Hooks into main operator workflow (`patcher.py`,
`main.py`)
### Configuration
```yaml
apiVersion: nuvolaris.org/v1
kind: Whisk
metadata:
name: controller
spec:
components:
spark: true
spark:
enabled: true
mode: standalone
image: apache/spark:3.5.0
master:
memory: 1Gi
cpu: 1000m
history:
enabled: true
backend: s3a
s3a:
bucket: spark-history
endpoint: http://minio.nuvolaris.svc.cluster.local:9000
secretRef: nuvolaris-minio
```
## Testing
Tested on **MicroK8s** cluster:
- ✅ Spark Master deployment and healthy startup
- ✅ History Server with MinIO integration
- ✅ Resource limits properly applied
- ✅ Service endpoints accessible (`spark://spark-master:7077`)
- ✅ Web UI available on port 8080
### Verification
```bash
kubectl -n nuvolaris get pods -l app=spark
NAME READY STATUS RESTARTS AGE
spark-history-7b7d97c7d 1/1 Running 0 10m
spark-master-0 1/1 Running 0 10m
```
## Use Cases
- **Data Processing**: Run Spark jobs within OpenServerless environment
- **ETL Pipelines**: Process large datasets stored in MinIO
- **Machine Learning**: Train models using Spark MLlib
- **Analytics**: Query and analyze data alongside serverless functions
## Future Enhancements
- [ ] Dynamic Spark Worker scaling
- [ ] Spark application submission via operator API
- [ ] Metrics integration with Prometheus
- [ ] Support for Spark on Kubernetes mode
- [ ] Jupyter notebook integration
## Documentation
User documentation and examples to be added in follow-up PRs.
---
**Related Issues**: Closes #[issue-number]
**Operator Submodule PR**: mobs75/openserverless-operator#[pr-number]
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]