github-actions[bot] commented on code in PR #61603:
URL: https://github.com/apache/doris/pull/61603#discussion_r2984640001


##########
fe/fe-core/src/test/java/org/apache/doris/common/profile/SummaryProfileTest.java:
##########
@@ -24,40 +24,106 @@
 public class SummaryProfileTest {
 
     @Test
-    public void testPlanSummary() {
+    public void testPlanSummaryWithQueryTrace() {
         SummaryProfile profile = new SummaryProfile();
-        profile.setQueryBeginTime(1);
-        profile.setParseSqlStartTime(3);
-        profile.setParseSqlFinishTime(6);
-        profile.setNereidsLockTableFinishTime(10);
-        profile.setNereidsAnalysisTime(15);
-        profile.setNereidsRewriteTime(21);
-        profile.setNereidsCollectTablePartitionFinishTime(28);
-        profile.setNereidsPreRewriteByMvFinishTime(31);
-        profile.setNereidsOptimizeTime(36);
-        profile.setNereidsTranslateTime(45);
-        profile.setNereidsDistributeTime(55);
-        profile.setQueryPlanFinishTime(66);
-        profile.setQueryScheduleFinishTime(78);
-        profile.setQueryFetchResultFinishTime(91);
-
-        profile.addCollectTablePartitionTime(7);
-        // update summary time
+        QueryTrace qt = profile.getQueryTrace();
+
+        // Simulate Parse SQL Time
+        qt.recordDuration("Parse SQL Time", 3);
+
+        // Simulate Plan sub-phase durations using QueryTrace
+        qt.recordDuration("Nereids Lock Table Time", 4);
+        qt.recordDuration("Nereids Analysis Time", 5);
+        qt.recordDuration("Nereids Rewrite Time", 6);
+        qt.recordDuration("Nereids Collect Table Partition Time", 7);
+        qt.recordDuration("Nereids Pre Rewrite By MV Time", 3);
+        qt.recordDuration("Nereids Optimize Time", 5);

Review Comment:
   **Bug: String literal mismatch**
   
   This uses `"Nereids Pre Rewrite By MV Time"` (uppercase "MV") but the 
constant `SummaryProfile.NEREIDS_PRE_REWRITE_BY_MV_TIME` is `"Nereids Pre 
Rewrite By Mv Time"` (mixed case "Mv"). The test records a duration under the 
wrong key, so it doesn't actually test this metric.
   
   Should be:
   ```java
   qt.recordDuration(SummaryProfile.NEREIDS_PRE_REWRITE_BY_MV_TIME, 3);
   ```
   
   In general, all `recordDuration` calls in this test should use the 
`SummaryProfile.*` constants instead of raw strings to prevent this kind of 
drift.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/InitMaterializationContextHook.java:
##########
@@ -77,8 +76,8 @@ public void afterRewrite(CascadesContext cascadesContext) {
         }
         StatementContext statementContext = 
cascadesContext.getStatementContext();
         if (statementContext.getConnectContext().getExecutor() != null) {
-            
statementContext.getConnectContext().getExecutor().getSummaryProfile()
-                    
.setNereidsCollectTablePartitionFinishTime(TimeUtils.getStartTimeMs());
+            // The "collect table partition" timing is recorded by the Nereids 
rewrite phase
+            // as a cumulative span via QueryTrace.
         }

Review Comment:
   **Dead code + missing timing**
   
   This if-block now contains only a comment and no actual code — it should be 
removed entirely.
   
   Also note: the `MaterializedViewUtils.collectTableUsedPartitions()` call at 
line 75 is no longer timed. Previously, 
`setNereidsCollectTablePartitionFinishTime()` captured the overall wall-clock 
time including this call. Now only the `collectTableUsedPartitions` call inside 
`AbstractMaterializedViewRule.doRewrite()` is timed. This means the timing from 
this hook path is lost.



##########
fe/fe-core/src/main/java/org/apache/doris/common/profile/SummaryProfile.java:
##########
@@ -578,12 +499,6 @@ private void updateExecutionSummaryProfile() {
         executionSummaryProfile.addInfoString(FRAGMENT_COMPRESSED_SIZE,
                 RuntimeProfile.printCounter(fragmentCompressedSize, 
TUnit.BYTES));
         executionSummaryProfile.addInfoString(FRAGMENT_RPC_COUNT, "" + 
fragmentRpcCount);
-        executionSummaryProfile.addInfoString(WAIT_FETCH_RESULT_TIME,
-                getPrettyTime(queryFetchResultFinishTime, 
queryScheduleFinishTime, TUnit.TIME_MS));

Review Comment:
   **Bug: `WAIT_FETCH_RESULT_TIME` metric dropped**
   
   The old code emitted:
   ```java
   executionSummaryProfile.addInfoString(WAIT_FETCH_RESULT_TIME,
       getPrettyTime(queryFetchResultFinishTime, queryScheduleFinishTime, 
TUnit.TIME_MS));
   ```
   
   This line was removed but no corresponding `QueryTrace.recordDuration("Wait 
and Fetch Result Time", ...)` was added anywhere. The 
`setQueryFetchResultFinishTime()` and `setQueryScheduleFinishTime()` setters 
are still being called from `StmtExecutor` and `AbstractInsertExecutor`, but 
the computed metric is never emitted to the profile.
   
   This metric (the wall-clock time from schedule finish to fetch finish) is 
useful for understanding query latency and should be migrated to QueryTrace.



##########
fe/fe-core/src/main/java/org/apache/doris/common/profile/QueryTrace.java:
##########
@@ -0,0 +1,359 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.profile;
+
+import org.apache.doris.thrift.TUnit;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayDeque;
+import java.util.Deque;
+import java.util.LinkedHashMap;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+/**
+ * A per-query tracing container that dynamically collects timing spans, 
counters and text attributes
+ * during FE-side query execution. Replaces the hardcoded field-based approach 
in SummaryProfile.
+ *
+ * <h3>Core Concepts:</h3>
+ * <ul>
+ *   <li><b>Span</b> – A timed phase, created via {@link #startSpan(String)} 
with try-with-resources.
+ *       Multiple starts of the same name accumulate durations.</li>
+ *   <li><b>Counter</b> – A numeric metric (count, bytes, rate), set or 
accumulated.</li>
+ *   <li><b>Text</b> – A string attribute (version, workload group name, 
etc.).</li>
+ * </ul>
+ *
+ * <h3>ThreadLocal Scope:</h3>
+ * <p>A ThreadLocal Span stack automatically tracks the current active span.
+ * When {@link #startSpan(String)} is called, the stack-top span is used as 
the parent.
+ * This enables cross-class span nesting without parameter passing.</p>
+ *
+ * <h3>Display Order and Indentation:</h3>
+ * <ul>
+ *   <li>Display order = insertion order (first-seen order, maintained by 
LinkedHashMap)</li>
+ *   <li>Indentation = parent chain depth (computed at registration time)</li>
+ * </ul>
+ *
+ * <p>Usage:
+ * <pre>{@code
+ * // Timed span
+ * try (ProfileSpan span = trace.startSpan("Plan Time")) {
+ *     try (ProfileSpan child = trace.startSpan("Nereids Analysis Time")) {
+ *         // automatically a child of "Plan Time"
+ *     }
+ * }
+ *
+ * // Counter
+ * trace.setCounter("Fragment Compressed Size", size, TUnit.BYTES);
+ * trace.addCounter("Fragment RPC Count", 1);
+ *
+ * // Text
+ * trace.setText("Doris Version", version);
+ * }</pre>
+ */
+public class QueryTrace {
+
+    /**
+     * Type of a registered metric entry.
+     */
+    enum MetricType {
+        SPAN,       // Timed span (AutoCloseable, durations accumulate)
+        COUNTER,    // Numeric counter / gauge
+        TEXT        // String attribute
+    }
+
+    /**
+     * Internal data holder for a single registered metric.
+     */
+    static class MetricEntry {
+        final String name;
+        final String parentName;   // null for root-level entries
+        final MetricType type;
+        final TUnit unit;
+        final int indentLevel;     // computed from parent chain depth
+
+        // Numeric value storage (for SPAN durations and COUNTER values).
+        // Use volatile for thread safety on simple set/get.
+        // For accumulated spans/counters, synchronized(this) on the entry 
provides atomicity.
+        volatile long numericValue;
+        // Text value storage
+        volatile String textValue;
+
+        MetricEntry(String name, String parentName, MetricType type, TUnit 
unit, int indentLevel) {
+            this.name = name;
+            this.parentName = parentName;
+            this.type = type;
+            this.unit = unit;
+            this.indentLevel = indentLevel;
+            this.numericValue = 0;
+            this.textValue = null;
+        }
+
+        String format() {
+            switch (type) {
+                case SPAN:
+                case COUNTER:
+                    return RuntimeProfile.printCounter(numericValue, unit);
+                case TEXT:
+                    return textValue != null ? textValue : "N/A";
+                default:
+                    return "N/A";
+            }
+        }
+    }
+
+    // ---- All metrics, keyed by name, insertion-ordered for display ----
+    private final LinkedHashMap<String, MetricEntry> metrics = new 
LinkedHashMap<>();
+    private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
+
+    // ---- ThreadLocal Span stack for automatic parent detection ----
+    private final ThreadLocal<Deque<ProfileSpan>> spanStack = 
ThreadLocal.withInitial(ArrayDeque::new);
+

Review Comment:
   **Minor: ThreadLocal leak potential**
   
   This `ThreadLocal` is an instance field of `QueryTrace`. When `QueryTrace` 
is GC'd, entries in pooled worker threads' `ThreadLocalMap` won't be cleaned up 
until those threads access the ThreadLocal again (which they won't, since the 
QueryTrace reference is gone). The entry will be cleaned up eventually when the 
thread's ThreadLocalMap performs expunge-stale-entries, but this is 
non-deterministic.
   
   For long-lived thread pools, consider explicitly clearing the stack at the 
end of query execution (e.g., a `cleanup()` method), or at minimum documenting 
this behavior.



##########
fe/fe-core/src/main/java/org/apache/doris/common/profile/QueryTrace.java:
##########
@@ -0,0 +1,359 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.profile;
+
+import org.apache.doris.thrift.TUnit;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayDeque;
+import java.util.Deque;
+import java.util.LinkedHashMap;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+/**
+ * A per-query tracing container that dynamically collects timing spans, 
counters and text attributes
+ * during FE-side query execution. Replaces the hardcoded field-based approach 
in SummaryProfile.
+ *
+ * <h3>Core Concepts:</h3>
+ * <ul>
+ *   <li><b>Span</b> – A timed phase, created via {@link #startSpan(String)} 
with try-with-resources.
+ *       Multiple starts of the same name accumulate durations.</li>
+ *   <li><b>Counter</b> – A numeric metric (count, bytes, rate), set or 
accumulated.</li>
+ *   <li><b>Text</b> – A string attribute (version, workload group name, 
etc.).</li>
+ * </ul>
+ *
+ * <h3>ThreadLocal Scope:</h3>
+ * <p>A ThreadLocal Span stack automatically tracks the current active span.
+ * When {@link #startSpan(String)} is called, the stack-top span is used as 
the parent.
+ * This enables cross-class span nesting without parameter passing.</p>
+ *
+ * <h3>Display Order and Indentation:</h3>
+ * <ul>
+ *   <li>Display order = insertion order (first-seen order, maintained by 
LinkedHashMap)</li>
+ *   <li>Indentation = parent chain depth (computed at registration time)</li>
+ * </ul>
+ *
+ * <p>Usage:
+ * <pre>{@code
+ * // Timed span
+ * try (ProfileSpan span = trace.startSpan("Plan Time")) {
+ *     try (ProfileSpan child = trace.startSpan("Nereids Analysis Time")) {
+ *         // automatically a child of "Plan Time"
+ *     }
+ * }
+ *
+ * // Counter
+ * trace.setCounter("Fragment Compressed Size", size, TUnit.BYTES);
+ * trace.addCounter("Fragment RPC Count", 1);
+ *
+ * // Text
+ * trace.setText("Doris Version", version);
+ * }</pre>
+ */
+public class QueryTrace {
+
+    /**
+     * Type of a registered metric entry.
+     */
+    enum MetricType {
+        SPAN,       // Timed span (AutoCloseable, durations accumulate)
+        COUNTER,    // Numeric counter / gauge
+        TEXT        // String attribute
+    }
+
+    /**
+     * Internal data holder for a single registered metric.
+     */
+    static class MetricEntry {
+        final String name;
+        final String parentName;   // null for root-level entries
+        final MetricType type;
+        final TUnit unit;
+        final int indentLevel;     // computed from parent chain depth
+
+        // Numeric value storage (for SPAN durations and COUNTER values).
+        // Use volatile for thread safety on simple set/get.
+        // For accumulated spans/counters, synchronized(this) on the entry 
provides atomicity.
+        volatile long numericValue;
+        // Text value storage
+        volatile String textValue;
+
+        MetricEntry(String name, String parentName, MetricType type, TUnit 
unit, int indentLevel) {
+            this.name = name;
+            this.parentName = parentName;
+            this.type = type;
+            this.unit = unit;
+            this.indentLevel = indentLevel;
+            this.numericValue = 0;
+            this.textValue = null;
+        }
+
+        String format() {
+            switch (type) {
+                case SPAN:
+                case COUNTER:
+                    return RuntimeProfile.printCounter(numericValue, unit);
+                case TEXT:
+                    return textValue != null ? textValue : "N/A";
+                default:
+                    return "N/A";
+            }
+        }
+    }
+
+    // ---- All metrics, keyed by name, insertion-ordered for display ----
+    private final LinkedHashMap<String, MetricEntry> metrics = new 
LinkedHashMap<>();
+    private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
+
+    // ---- ThreadLocal Span stack for automatic parent detection ----
+    private final ThreadLocal<Deque<ProfileSpan>> spanStack = 
ThreadLocal.withInitial(ArrayDeque::new);
+
+    // ============================================================
+    // Span API
+    // ============================================================
+
+    /**
+     * Start a new timed span. The parent is automatically detected from the 
ThreadLocal span stack.
+     * Must be used with try-with-resources.
+     *
+     * <p>If the same span name is started multiple times, durations 
accumulate.
+     *
+     * @param name the display name of this span (e.g. "Nereids Analysis Time")
+     * @return a ProfileSpan that must be closed via try-with-resources
+     */
+    public ProfileSpan startSpan(String name) {
+        ProfileSpan parentSpan = peekCurrentSpan();
+        String parentName = parentSpan != null ? parentSpan.getName() : null;
+
+        ensureRegistered(name, parentName, MetricType.SPAN, TUnit.TIME_MS);
+
+        ProfileSpan span = new ProfileSpan(name, this, parentSpan);
+        pushCurrentSpan(span);
+        return span;
+    }
+
+    /**
+     * Record a pre-computed duration directly (for cases where 
try-with-resources is not feasible,
+     * e.g. GC time computed as a delta between two JMX readings).
+     *
+     * <p>The parent is inferred from the current span stack. Multiple calls 
accumulate.
+     *
+     * @param name the display name (e.g. "Garbage Collect During Plan Time")
+     * @param durationMs the duration in milliseconds
+     */
+    public void recordDuration(String name, long durationMs) {
+        String parentName = getCurrentSpanName();
+        ensureRegistered(name, parentName, MetricType.SPAN, TUnit.TIME_MS);
+        MetricEntry entry = getEntry(name);
+        if (entry != null) {
+            synchronized (entry) {
+                entry.numericValue += durationMs;
+            }
+        }
+    }
+
+    // ============================================================
+    // Counter API
+    // ============================================================
+
+    /**
+     * Set a counter value (overwrite semantics). Auto-registers on first use.
+     * Parent is inferred from the current span stack.
+     *
+     * @param name the display name (e.g. "Fragment Compressed Size")
+     * @param value the value
+     * @param unit the display unit (e.g. TUnit.BYTES)
+     */
+    public void setCounter(String name, long value, TUnit unit) {
+        String parentName = getCurrentSpanName();
+        ensureRegistered(name, parentName, MetricType.COUNTER, unit);
+        MetricEntry entry = getEntry(name);
+        Preconditions.checkNotNull(entry);
+        entry.numericValue = value;
+    }
+
+    /**
+     * Add to a counter (accumulate semantics). Default unit is TUnit.UNIT.
+     */
+    public void addCounter(String name, long delta) {
+        addCounter(name, delta, TUnit.UNIT);
+    }
+
+    /**
+     * Add to a counter with a specified unit.
+     */
+    public void addCounter(String name, long delta, TUnit unit) {
+        String parentName = getCurrentSpanName();
+        ensureRegistered(name, parentName, MetricType.COUNTER, unit);
+        MetricEntry entry = getEntry(name);
+        Preconditions.checkNotNull(entry);
+        synchronized (entry) {
+            entry.numericValue += delta;
+        }
+    }
+
+    // ============================================================
+    // Text API
+    // ============================================================
+
+    /**
+     * Set a text attribute. Auto-registers on first use.
+     *
+     * @param name the display name (e.g. "Doris Version")
+     * @param value the text value
+     */
+    public void setText(String name, String value) {
+        String parentName = getCurrentSpanName();
+        ensureRegistered(name, parentName, MetricType.TEXT, TUnit.NONE);
+        MetricEntry entry = getEntry(name);
+        Preconditions.checkNotNull(entry);
+        entry.textValue = value;
+    }
+
+    // ============================================================
+    // Data Read API (for MetricRepo reporting, etc.)
+    // ============================================================
+
+    /**
+     * Get the duration in milliseconds for a span. Returns -1 if not found.
+     */
+    public long getDurationMs(String name) {
+        MetricEntry entry = getEntry(name);
+        return entry != null ? entry.numericValue : -1;
+    }
+
+    /**
+     * Get the counter value. Returns 0 if not found.
+     */
+    public long getCounterValue(String name) {
+        MetricEntry entry = getEntry(name);
+        return entry != null ? entry.numericValue : 0;
+    }
+
+    /**
+     * Get the text value. Returns null if not found.
+     */
+    public String getTextValue(String name) {
+        MetricEntry entry = getEntry(name);
+        return entry != null ? entry.textValue : null;
+    }
+
+    // ============================================================
+    // Output: populate RuntimeProfile
+    // ============================================================
+
+    /**
+     * Populate a RuntimeProfile with all registered metrics, in insertion 
order.
+     * Indentation is automatically applied based on parent-child depth.
+     *
+     * @param profile the RuntimeProfile to populate (typically the "Execution 
Summary" profile)
+     */
+    public void populateProfile(RuntimeProfile profile) {
+        lock.readLock().lock();
+        try {
+            for (MetricEntry entry : metrics.values()) {
+                String formatted = entry.format();
+                if (formatted != null) {
+                    profile.addInfoString(entry.name, formatted, 
entry.indentLevel);
+                }
+            }
+        } finally {
+            lock.readLock().unlock();
+        }
+    }
+
+    // ============================================================
+    // Internal: Span stack management
+    // ============================================================
+
+    void pushCurrentSpan(ProfileSpan span) {
+        spanStack.get().push(span);
+    }
+
+    void popCurrentSpan(ProfileSpan span) {
+        Deque<ProfileSpan> stack = spanStack.get();
+        if (!stack.isEmpty() && stack.peek() == span) {
+            stack.pop();

Review Comment:
   **Silent failure on mismatched pop**
   
   If `stack.peek() != span` (e.g., due to an exception that prevents a child 
span from closing before a parent), the pop is silently skipped. This would 
leave stale spans on the stack, corrupting parent detection for all subsequent 
spans on this thread.
   
   Consider at minimum logging a warning when a mismatch occurs. For 
robustness, you could also pop up to and including the target span (clearing 
any leaked child spans).



##########
fe/fe-core/src/main/java/org/apache/doris/common/profile/SummaryProfile.java:
##########
@@ -611,33 +526,17 @@ private void updateExecutionSummaryProfile() {
             executionSummaryProfile.addInfoString(GET_TABLE_VERSION_TIME, 
getPrettyGetTableVersionTime());
             executionSummaryProfile.addInfoString(GET_TABLE_VERSION_COUNT, 
getPrettyGetTableVersionCount());
         }
+
+        // Populate dynamically registered metrics from QueryTrace
+        if (queryTrace != null) {
+            queryTrace.populateProfile(executionSummaryProfile);
+        }

Review Comment:
   **Display ordering concern**
   
   Previously, `updateExecutionSummaryProfile()` emitted metrics in a 
deliberate order matching `EXECUTION_SUMMARY_KEYS` (Parse -> Plan -> sub-phases 
-> Schedule -> Fetch -> Write). Now, non-migrated metrics 
(SCHEDULE_TIME_PER_BE, ASSIGN_FRAGMENT_TIME, FRAGMENT_* etc.) are emitted 
first, then `queryTrace.populateProfile()` appends QueryTrace metrics in 
runtime insertion order.
   
   This means `Parse SQL Time` and `Plan Time` will now appear _after_ 
fragment/schedule-related metrics in the profile output, which is a 
user-visible ordering change. The PR description says "profile output format is 
preserved" but this is not accurate.
   
   Consider either:
   1. Emitting QueryTrace metrics first, or
   2. Using the `EXECUTION_SUMMARY_KEYS` list to control output order during 
`populateProfile()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to