github-actions[bot] commented on code in PR #64614:
URL: https://github.com/apache/doris/pull/64614#discussion_r3456782658
##########
be/src/load/channel/load_stream.cpp:
##########
@@ -445,12 +445,9 @@ void IndexStream::close(const std::vector<PTabletID>&
tablets_to_commit,
}
}
-// TODO: Profile is temporary disabled, because:
-// 1. It's not being processed by the upstream for now
-// 2. There are some problems in _profile->to_thrift()
LoadStream::LoadStream(const PUniqueId& load_id, LoadStreamMgr*
load_stream_mgr,
bool enable_profile)
- : _load_id(load_id), _enable_profile(false),
_load_stream_mgr(load_stream_mgr) {
+ : _load_id(load_id), _enable_profile(enable_profile),
_load_stream_mgr(load_stream_mgr) {
Review Comment:
Enabling `_enable_profile` makes the `_report_result()` profile branch live,
but that branch checks `_close_load_cnt == _total_streams` outside `_lock`.
`close()` increments `_close_load_cnt` under `_lock`, and `_report_result()`
runs from per-stream BRPC callbacks for the same `LoadStream`, so
profile-enabled multi-stream loads can race on the close count and decide
profile emission from an inconsistent value. There is also an error path in
`_dispatch()` that calls `_report_failure()` while already holding `_lock`;
after the final close count is reached, that can re-enter the profile
serialization lock in `_report_result()`. Please compute/pass the final-close
profile flag while holding `_lock` or guard this read with the same
synchronization before enabling this path.
##########
be/src/exec/sink/writer/vtablet_writer_v2.cpp:
##########
@@ -806,6 +807,15 @@ Status VTabletWriterV2::_close_wait(
SCOPED_TIMER(_close_load_timer);
Status status;
auto streams_for_node = _load_stream_map->get_streams_for_node();
+ auto streams_to_collect_profile = unfinished_streams;
+ Defer collect_load_stream_profiles {[&]() {
+ for (const auto& stream : streams_to_collect_profile) {
+ auto load_stream_profile =
stream->collect_load_stream_profile(_state->profile_level());
+ if (load_stream_profile != nullptr) {
Review Comment:
This updates the task profile directly with every remote tree, but each
remote tree has the same root name (`LoadStream`) and children keyed only by
index/tablet names (`IndexStream ...`, `TabletStream ...`).
`RuntimeProfile::update()` merges children with matching names, so replicated
writes to multiple destination BEs can collapse those per-BE profiles into one
subtree and overwrite counters instead of showing one backend's stream profile
per replica. The old `LoadChannel` profile path avoided that by wrapping data
under `LoadChannel load_id=... (host=..., backend_id=...)`. Please wrap each
collected stream profile in a backend/stream-unique child, or include the
backend identity in the remote root, before merging it into
`load_channel_profile`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]