Hello Spark devs,

I built stopstreaming, a small library that allows you to stop Spark
Structured Streaming jobs through signals by adding a single extension
method to the StreamingQuery. The library extends StreamingQuery with a
single method: query.awaitExternalTermination(config).

Github: https://github.com/a-biratsis/stopstreaming

The goal is to stop Spark Structured Streaming jobs through external
signals, without relying on calling StreamingQuery.stop() directly, by
leveraging Spark's existing API (StreamingQueryListener is used internally).

Currently two watchers are available:

   - *REST*: embeds a lightweight HTTP server on the Spark driver. User
   sends a POST /stop/<query-id> from any orchestrator, notebook, or curl
   command.
   - *FileSystem*: creates a marker file at startup. Delete it (locally or
   from DBFS) and the query stops. Works in Databricks with dbutils.fs.rm(...)
   or in any other FS using rm.

Soon I will add more watchers such as Kafka and Azure Service Bus, and of
course I am open to new ideas/suggestions.

Please let me know if you find the above useful and it could enhance or
extend the current Spark API.

Best,
Alex

Reply via email to