beenhead opened a new pull request, #19547:
URL: https://github.com/apache/druid/pull/19547

   Have you ever stood up a streaming supervisor, only to later wish you could 
replay that same data as a batch (SQL-based) ingestion? Well, be prepared to 
have your experience taken to the next level!
   
   This PR adds a `Convert supervisor to SQL` flow to the web console query 
view. It complements the existing [Convert ingestion spec to 
SQL](https://github.com/apache/druid/pull/12919) tool: where that one migrates 
native batch and Hadoop specs, this one takes a streaming supervisor 
(Kafka/Kinesis) and generates an equivalent [Multi-Stage 
Query](https://druid.apache.org/docs/latest/multi-stage-query/) INSERT 
statement that reads from files instead of a stream — handy for backfills, 
reprocessing, and one-off migrations.
   
   ### Convert supervisor to SQL dialog
   A new Convert supervisor to SQL item is available from the ... menu in the 
query view.
   
   <img width="983" height="507" alt="image" 
src="https://github.com/user-attachments/assets/b9107858-82e2-4bdb-8368-cf21f8d9e75b";
 />
   
   Clicking it opens a dialog that walks you through the conversion.
   
   <img width="295" alt="image" 
src="https://github.com/user-attachments/assets/df5c6ba5-a968-4830-90ef-cdf9e963e178";
 />
   
   ### Pick your supervisor
   You can either select an existing supervisor (the dialog fetches the list 
from `/druid/indexer/v1/supervisor` and loads the spec on selection) or paste a 
supervisor JSON spec directly. Pasting is validated as you type, so malformed 
JSON surfaces an inline error rather than failing silently.
   
   ### Point it at your data
   Because a streaming supervisor has no batch input source, the dialog asks 
where the equivalent files live. It pre-populates the file location from the 
supervisor's `ioConfig.inputSource` when one is present, and you can pick the 
file type (JSON, CSV, Parquet, or ORC). The location scheme is used to build 
the right input source:
   
   s3://… → an s3 input source (with an objectGlob added automatically for 
directory locations)
   gs://… → a google input source
   http://… / https://… → an http input source
   anything else (including file://…) → a local input source
   ### Generate the SQL
   Clicking Generate SQL converts the spec and drops the resulting query into a 
new tab for you to review and edit before running. The conversion:
   
   - Builds a SELECT … FROM TABLE(EXTERN(…)) over the chosen files
   - Maps the supervisor's metricsSpec aggregators to their SQL equivalents 
(longSum → SUM, thetaSketch → APPROX_COUNT_DISTINCT_DS_THETA, HLLSketchBuild → 
APPROX_COUNT_DISTINCT_DS_HLL, and so on), adding a GROUP BY when rollup 
aggregations are present
   - Parses the timestamp via TIME_PARSE using the supervisor's timestampSpec
   - Emits PARTITIONED BY DAY and CLUSTERED BY the leading dimensions
   <img width="983" height="440" alt="image" 
src="https://github.com/user-attachments/assets/95c4ec74-523d-469f-aa36-1e483e268b8c";
 />
   
   ### Tests
   Added `supervisor-conversion.spec.ts` covering the conversion helper: rollup 
vs. non-rollup queries, the full set of supported metric aggregations and their 
non-default arguments, dropping of unsupported metrics, input-source detection 
for each scheme, timestamp handling, partitioning/clustering, and the error 
paths.
   The existing `supervisor-to-sql-dialog.spec.tsx` covers basic rendering and 
the close action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to