sollhui opened a new issue, #56191:
URL: https://github.com/apache/doris/issues/56191

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Description
   
   ## Background
   
   As an OLAP system, a basic requirement is the way to handle different data 
sources, such as S3/Kafka/CDC, etc. Traditional incremental pipelines rely on 
extra middleware or self-written jobs, which brings:
   
   1. Delay: the more systems flow through, the more unavoidable the delay 
becomes
   
   2. Complexity: need to maintain more component.
   
   We therefore introduce **Streaming Job**, a native low-latency ingest path 
that moves incremental data directly into Doris with **low-latency**, 
**simplicity**, and **exactly once semantics** guarantee.
   
   ## Design
   
   ### Grammar
   Reuse the syntax of the job, simply mark it `ON STREAMING`:
   ```
   CREATE JOB example_job 
   Properties(
   )
   ON STREAMING
   DO 
   INSERT INTO db.tbl
   select * from tvf ()
   ```
   
   user can alter job
   ```
   Alter Job FOR jobName
   Properties(
   )
   ON STREAMING
   DO 
   INSERT INTO db.tbl
   select * from tvf ()
   ```
   
   query job:
   ```
   select * from job(type=insert) where ExecuteType = streaming
   ```
   
   ### Schedule
   architecture diagram:
   <img width="2912" height="1230" alt="Image" 
src="https://github.com/user-attachments/assets/4043f609-f723-42ea-8629-c5001c97822a";
 />
   
   Scheduler is included job schedule and task schedule:
   
   1. Job Schedule (time-driven): reuse the logic of time wheels, generate job 
scheduler subtasks at regular time.
   
   2. Task Schedule (Event driven) : driven-scheduling relies on the callback 
after task completion.
   
   ### Offset management
   Exactly-once is achieved through persistent offset plus two-phase task 
verification:
   
   - Offset is committed only after data is visible and durable in Doris.
   
   - Each task carries a monotonic ID; the scheduler rejects any duplicate or 
out-of-order task, eliminating replay risk.
   
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to