Baunsgaard opened a new pull request, #2520:
URL: https://github.com/apache/systemds/pull/2520

   Add scripts/databricks: a self-contained kit for deploying and running 
SystemDS on a Databricks cluster (tested on DBR 16.4 LTS, Spark 3.5.2, Scala 
2.12, where the SystemDS jar runs unchanged).
   
   - deploy.sh: create a UC volume, upload SystemDS.jar, create a single-user 
cluster with the required Vector API / --add-opens JVM flags, install the Delta 
Kernel Maven libraries, and import the demo notebooks. All settings come from a 
.env file (template in .env.example); local state (.env, .cluster_id) is 
git-ignored.
   - SystemDS_MLContext_Demo.scala: Unity Catalog table round-trip via the 
MLContext (Scala) API with a configurable DML script and execution mode.
   - SystemDS_vs_SparkML_LinReg.scala: linear regression with categorical 
encoding (transformencode + lm) vs a Spark ML OneHotEncoder + LinearRegression 
pipeline, timing encode + train.
   - SystemDS_Delta_E2E.scala: end-to-end Delta -> transformencode -> lm, 
reading a Delta table natively as a frame, compared against the equivalent 
Spark ML pipeline; prints a per-instruction breakdown.
   - SystemDS_Python_Demo.py and demo.dml: minimal Python API and DML smoke 
tests.
   - README.md: setup, configuration, node-type guidance, Delta Kernel library 
requirement, and indicative benchmark numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to