This is an automated email from the ASF dual-hosted git repository.
sandy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 55061e9c2a58 [SPARK-55223][DOCS] Document sinks in declarative
pipelines programming guide
55061e9c2a58 is described below
commit 55061e9c2a589aeb067cf48857167a5a0136caf3
Author: Sandy Ryza <[email protected]>
AuthorDate: Tue Jan 27 08:27:08 2026 -0800
[SPARK-55223][DOCS] Document sinks in declarative pipelines programming
guide
### What changes were proposed in this pull request?
Adds a section on sinks to the declarative pipelines programming guide page
in the docs
### Why are the changes needed?
The SDP programming guide should cover all SDP features
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
### Was this patch authored or co-authored using generative AI tooling?
Closes #53991 from sryza/sink-docs.
Authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Sandy Ryza <[email protected]>
---
docs/declarative-pipelines-programming-guide.md | 46 +++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/docs/declarative-pipelines-programming-guide.md
b/docs/declarative-pipelines-programming-guide.md
index 859258a430dd..5b3a06fe26c0 100644
--- a/docs/declarative-pipelines-programming-guide.md
+++ b/docs/declarative-pipelines-programming-guide.md
@@ -449,6 +449,52 @@ AS INSERT INTO customers_us
SELECT * FROM STREAM(customers_us_east);
```
+## Writing Data to External Targets with Sinks
+
+Sinks in SDP provide a way to write transformed data to external destinations
beyond the default streaming tables and materialized views. Sinks are
particularly useful for operational use cases that require low-latency data
processing, reverse ETL operations, or writing to external systems.
+
+Sinks enable a pipeline to write to any destination that a Spark Structured
Streaming query can be written to, including, but not limited to, **Apache
Kafka** and **Azure Event Hubs**.
+
+### Creating and Using Sinks in Python
+
+Working with sinks involves two main steps: creating the sink definition and
implementing an append flow to write data.
+
+#### Creating a Kafka Sink
+
+You can create a sink that streams data to a Kafka topic:
+
+```python
+from pyspark import pipelines as dp
+from pyspark.sql.functions import to_json, struct
+
+dp.create_sink(
+ name="kafka_sink",
+ format="kafka",
+ options={
+ "kafka.bootstrap.servers": "localhost:9092",
+ "topic": "processed_orders"
+ }
+)
+
[email protected]_flow(target="kafka_sink")
+def kafka_orders_flow() -> DataFrame:
+ return (
+ spark.readStream.table("customer_orders")
+ .select(
+ col("order_id").cast("string").alias("key"),
+ to_json(struct("*")).alias("value")
+ )
+ )
+```
+
+### Sink Considerations
+
+When working with sinks, keep the following considerations in mind:
+
+- **Streaming-only**: Sinks currently support only streaming queries through
`append_flow` decorators
+- **Python API**: Sink functionality is available only through the Python API,
not SQL
+- **Append-only**: Only append operations are supported; full refresh updates
reset checkpoints but do not clean previously computed results
+
## Important Considerations
### Python Considerations
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]