wenzhenghu opened a new pull request, #64649:
URL: https://github.com/apache/doris/pull/64649
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
This PR adds a new workload policy condition
`be_scan_bytes_from_remote_storage`, which allows Doris to cancel queries
according to the amount of data read from remote storage by BE scan tasks. This
is useful for limiting external table queries that read too much remote HDFS or
object storage data.
Implementation summary:
- Add a new BE-side workload metric type in thrift for remote storage scan
bytes.
- Add FE workload policy parsing, validation, metadata mapping, and replay
support for `be_scan_bytes_from_remote_storage`.
- Add BE workload condition evaluation based on
`io_context()->scan_bytes_from_remote_storage()`.
- Add regression coverage using an existing Hive external `lineitem` table.
### Release note
Support workload policy cancellation by BE remote storage scan bytes.
### Check List (For Author)
- Test:
- FE UT: passed
- BE UT: passed
- Regression test: passed, `test_workload_policy_remote_scan_bytes`
- Manual test: verified existing workload policy behavior and new remote
scan bytes cancellation on a deployed Doris instance
- Behavior changed: Yes. Add a new workload policy condition
`be_scan_bytes_from_remote_storage`.
- Does this need documentation: Yes. The workload policy condition list
should be updated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]