(spark) branch master updated: [SPARK-55223][DOCS] Document sinks in declarative pipelines programming guide

sandy Tue, 27 Jan 2026 08:27:31 -0800

This is an automated email from the ASF dual-hosted git repository.

sandy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 55061e9c2a58 [SPARK-55223][DOCS] Document sinks in declarative 
pipelines programming guide
55061e9c2a58 is described below

commit 55061e9c2a589aeb067cf48857167a5a0136caf3
Author: Sandy Ryza <[email protected]>
AuthorDate: Tue Jan 27 08:27:08 2026 -0800

    [SPARK-55223][DOCS] Document sinks in declarative pipelines programming 
guide
    
    ### What changes were proposed in this pull request?
    
    Adds a section on sinks to the declarative pipelines programming guide page 
in the docs
    
    ### Why are the changes needed?
    
    The SDP programming guide should cover all SDP features
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Closes #53991 from sryza/sink-docs.
    
    Authored-by: Sandy Ryza <[email protected]>
    Signed-off-by: Sandy Ryza <[email protected]>
---
 docs/declarative-pipelines-programming-guide.md | 46 +++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/docs/declarative-pipelines-programming-guide.md 
b/docs/declarative-pipelines-programming-guide.md
index 859258a430dd..5b3a06fe26c0 100644
--- a/docs/declarative-pipelines-programming-guide.md
+++ b/docs/declarative-pipelines-programming-guide.md
@@ -449,6 +449,52 @@ AS INSERT INTO customers_us
 SELECT * FROM STREAM(customers_us_east);
 ```
 
+## Writing Data to External Targets with Sinks
+
+Sinks in SDP provide a way to write transformed data to external destinations 
beyond the default streaming tables and materialized views. Sinks are 
particularly useful for operational use cases that require low-latency data 
processing, reverse ETL operations, or writing to external systems. 
+
+Sinks enable a pipeline to write to any destination that a Spark Structured 
Streaming query can be written to, including, but not limited to, **Apache 
Kafka** and **Azure Event Hubs**.
+
+### Creating and Using Sinks in Python
+
+Working with sinks involves two main steps: creating the sink definition and 
implementing an append flow to write data.
+
+#### Creating a Kafka Sink
+
+You can create a sink that streams data to a Kafka topic:
+
+```python
+from pyspark import pipelines as dp
+from pyspark.sql.functions import to_json, struct
+
+dp.create_sink(
+    name="kafka_sink",
+    format="kafka",
+    options={
+        "kafka.bootstrap.servers": "localhost:9092",
+        "topic": "processed_orders"
+    }
+)
+
[email protected]_flow(target="kafka_sink")
+def kafka_orders_flow() -> DataFrame:
+    return (
+        spark.readStream.table("customer_orders")
+        .select(
+            col("order_id").cast("string").alias("key"),
+            to_json(struct("*")).alias("value")
+        )
+    )
+```
+
+### Sink Considerations
+
+When working with sinks, keep the following considerations in mind:
+
+- **Streaming-only**: Sinks currently support only streaming queries through 
`append_flow` decorators
+- **Python API**: Sink functionality is available only through the Python API, 
not SQL
+- **Append-only**: Only append operations are supported; full refresh updates 
reset checkpoints but do not clean previously computed results
+
 ## Important Considerations
 
 ### Python Considerations


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55223][DOCS] Document sinks in declarative pipelines programming guide

Reply via email to