github-actions[bot] commented on code in PR #64614:
URL: https://github.com/apache/doris/pull/64614#discussion_r3456782658


##########
be/src/load/channel/load_stream.cpp:
##########
@@ -445,12 +445,9 @@ void IndexStream::close(const std::vector<PTabletID>& 
tablets_to_commit,
     }
 }
 
-// TODO: Profile is temporary disabled, because:
-// 1. It's not being processed by the upstream for now
-// 2. There are some problems in _profile->to_thrift()
 LoadStream::LoadStream(const PUniqueId& load_id, LoadStreamMgr* 
load_stream_mgr,
                        bool enable_profile)
-        : _load_id(load_id), _enable_profile(false), 
_load_stream_mgr(load_stream_mgr) {
+        : _load_id(load_id), _enable_profile(enable_profile), 
_load_stream_mgr(load_stream_mgr) {

Review Comment:
   Enabling `_enable_profile` makes the `_report_result()` profile branch live, 
but that branch checks `_close_load_cnt == _total_streams` outside `_lock`. 
`close()` increments `_close_load_cnt` under `_lock`, and `_report_result()` 
runs from per-stream BRPC callbacks for the same `LoadStream`, so 
profile-enabled multi-stream loads can race on the close count and decide 
profile emission from an inconsistent value. There is also an error path in 
`_dispatch()` that calls `_report_failure()` while already holding `_lock`; 
after the final close count is reached, that can re-enter the profile 
serialization lock in `_report_result()`. Please compute/pass the final-close 
profile flag while holding `_lock` or guard this read with the same 
synchronization before enabling this path.



##########
be/src/exec/sink/writer/vtablet_writer_v2.cpp:
##########
@@ -806,6 +807,15 @@ Status VTabletWriterV2::_close_wait(
     SCOPED_TIMER(_close_load_timer);
     Status status;
     auto streams_for_node = _load_stream_map->get_streams_for_node();
+    auto streams_to_collect_profile = unfinished_streams;
+    Defer collect_load_stream_profiles {[&]() {
+        for (const auto& stream : streams_to_collect_profile) {
+            auto load_stream_profile = 
stream->collect_load_stream_profile(_state->profile_level());
+            if (load_stream_profile != nullptr) {

Review Comment:
   This updates the task profile directly with every remote tree, but each 
remote tree has the same root name (`LoadStream`) and children keyed only by 
index/tablet names (`IndexStream ...`, `TabletStream ...`). 
`RuntimeProfile::update()` merges children with matching names, so replicated 
writes to multiple destination BEs can collapse those per-BE profiles into one 
subtree and overwrite counters instead of showing one backend's stream profile 
per replica. The old `LoadChannel` profile path avoided that by wrapping data 
under `LoadChannel load_id=... (host=..., backend_id=...)`. Please wrap each 
collected stream profile in a backend/stream-unique child, or include the 
backend identity in the remote root, before merging it into 
`load_channel_profile`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to