This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new bab98afd490b [SPARK-54231][SDP] Fill gaps in SDP docs
bab98afd490b is described below
commit bab98afd490b0aa553f7b5d9f75d65d204b66db0
Author: Sandy Ryza <[email protected]>
AuthorDate: Fri Nov 7 13:03:12 2025 -0800
[SPARK-54231][SDP] Fill gaps in SDP docs
### What changes were proposed in this pull request?
Fill some gaps in the SDP docs:
- Document storage field in pipeline spec
- Update documentation libraries field to reflect new name
- Provide installation instructions
### Why are the changes needed?
To make the docs more accurate.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
### Was this patch authored or co-authored using generative AI tooling?
Closes #52929 from sryza/sdp-docs-install-instructions.
Authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
docs/declarative-pipelines-programming-guide.md | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/docs/declarative-pipelines-programming-guide.md
b/docs/declarative-pipelines-programming-guide.md
index 3e33153e3d25..3932d472cf65 100644
--- a/docs/declarative-pipelines-programming-guide.md
+++ b/docs/declarative-pipelines-programming-guide.md
@@ -35,6 +35,16 @@ The key advantage of SDP is its declarative approach - you
define what tables sh

+### Quick install
+
+A quick way to install SDP is with pip:
+
+```
+pip install pyspark[pipelines]
+```
+
+See the [downloads page](//spark.apache.org/downloads.html) for more
installation options.
+
## Key Concepts
### Flows
@@ -67,8 +77,9 @@ A pipeline is the primary unit of development and execution
in SDP. A pipeline c
A pipeline project is a set of source files that contain code that define the
datasets and flows that make up a pipeline. These source files can be `.py` or
`.sql` files.
A YAML-formatted pipeline spec file contains the top-level configuration for
the pipeline project. It supports the following fields:
-- **definitions** (Required) - Paths where definition files can be found.
-- **database** (Optional) - The default target database for pipeline outputs.
+- **libraries** (Required) - Paths where source files can be found.
+- **storage** (Required) – A directory where checkpoints can be stored for
streams within the pipeline.
+- **database** (Optional) - The default target database for pipeline outputs.
**schema** can alternatively be used as an alias.
- **catalog** (Optional) - The default target catalog for pipeline outputs.
- **configuration** (Optional) - Map of Spark configuration properties.
@@ -76,11 +87,9 @@ An example pipeline spec file:
```yaml
name: my_pipeline
-definitions:
- - glob:
- include: transformations/**/*.py
+libraries:
- glob:
- include: transformations/**/*.sql
+ include: transformations/**
catalog: my_catalog
database: my_db
configuration:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]