This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new bab98afd490b [SPARK-54231][SDP] Fill gaps in SDP docs
bab98afd490b is described below

commit bab98afd490b0aa553f7b5d9f75d65d204b66db0
Author: Sandy Ryza <[email protected]>
AuthorDate: Fri Nov 7 13:03:12 2025 -0800

    [SPARK-54231][SDP] Fill gaps in SDP docs
    
    ### What changes were proposed in this pull request?
    
    Fill some gaps in the SDP docs:
    - Document storage field in pipeline spec
    - Update documentation libraries field to reflect new name
    - Provide installation instructions
    
    ### Why are the changes needed?
    
    To make the docs more accurate.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Closes #52929 from sryza/sdp-docs-install-instructions.
    
    Authored-by: Sandy Ryza <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 docs/declarative-pipelines-programming-guide.md | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/docs/declarative-pipelines-programming-guide.md 
b/docs/declarative-pipelines-programming-guide.md
index 3e33153e3d25..3932d472cf65 100644
--- a/docs/declarative-pipelines-programming-guide.md
+++ b/docs/declarative-pipelines-programming-guide.md
@@ -35,6 +35,16 @@ The key advantage of SDP is its declarative approach - you 
define what tables sh
 
 ![Dataflow Graph](img/declarative-pipelines-dataflow-graph.png)
 
+### Quick install
+
+A quick way to install SDP is with pip:
+
+```
+pip install pyspark[pipelines]
+```
+
+See the [downloads page](//spark.apache.org/downloads.html) for more 
installation options.
+
 ## Key Concepts
 
 ### Flows
@@ -67,8 +77,9 @@ A pipeline is the primary unit of development and execution 
in SDP. A pipeline c
 A pipeline project is a set of source files that contain code that define the 
datasets and flows that make up a pipeline. These source files can be `.py` or 
`.sql` files.
 
 A YAML-formatted pipeline spec file contains the top-level configuration for 
the pipeline project. It supports the following fields:
-- **definitions** (Required) - Paths where definition files can be found.
-- **database** (Optional) - The default target database for pipeline outputs.
+- **libraries** (Required) - Paths where source files can be found.
+- **storage** (Required) – A directory where checkpoints can be stored for 
streams within the pipeline.
+- **database** (Optional) - The default target database for pipeline outputs. 
**schema** can alternatively be used as an alias.
 - **catalog** (Optional) - The default target catalog for pipeline outputs.
 - **configuration** (Optional) - Map of Spark configuration properties.
 
@@ -76,11 +87,9 @@ An example pipeline spec file:
 
 ```yaml
 name: my_pipeline
-definitions:
-  - glob:
-      include: transformations/**/*.py
+libraries:
   - glob:
-      include: transformations/**/*.sql
+      include: transformations/**
 catalog: my_catalog
 database: my_db
 configuration:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to