sollhui opened a new issue, #56191: URL: https://github.com/apache/doris/issues/56191
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Description ## Background As an OLAP system, a basic requirement is the way to handle different data sources, such as S3/Kafka/CDC, etc. Traditional incremental pipelines rely on extra middleware or self-written jobs, which brings: 1. Delay: the more systems flow through, the more unavoidable the delay becomes 2. Complexity: need to maintain more component. We therefore introduce **Streaming Job**, a native low-latency ingest path that moves incremental data directly into Doris with **low-latency**, **simplicity**, and **exactly once semantics** guarantee. ## Design ### Grammar Reuse the syntax of the job, simply mark it `ON STREAMING`: ``` CREATE JOB example_job Properties( ) ON STREAMING DO INSERT INTO db.tbl select * from tvf () ``` user can alter job ``` Alter Job FOR jobName Properties( ) ON STREAMING DO INSERT INTO db.tbl select * from tvf () ``` query job: ``` select * from job(type=insert) where ExecuteType = streaming ``` ### Schedule architecture diagram: <img width="2912" height="1230" alt="Image" src="https://github.com/user-attachments/assets/4043f609-f723-42ea-8629-c5001c97822a" /> Scheduler is included job schedule and task schedule: 1. Job Schedule (time-driven): reuse the logic of time wheels, generate job scheduler subtasks at regular time. 2. Task Schedule (Event driven) : driven-scheduling relies on the callback after task completion. ### Offset management Exactly-once is achieved through persistent offset plus two-phase task verification: - Offset is committed only after data is visible and durable in Doris. - Each task carries a monotonic ID; the scheduler rejects any duplicate or out-of-order task, eliminating replay risk. ### Use case _No response_ ### Related issues _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
