This is an automated email from the ASF dual-hosted git repository.

hello-stephen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 4f8c14451a7 [fix](regression) handle cumulative delete-version 
compaction wait (#64945)
4f8c14451a7 is described below

commit 4f8c14451a73354f52270f0c94bda5b9c5063da0
Author: shuke <[email protected]>
AuthorDate: Wed Jul 1 21:17:58 2026 +0800

    [fix](regression) handle cumulative delete-version compaction wait (#64945)
    
    # [fix](regression) handle cumulative delete-version compaction wait
    
    ## Summary
    
    Fix `trigger_and_wait_compaction` so Cloud cumulative compaction that
    meets a delete version does not wait until the 300s timeout after valid
    progress has already happened.
    
    When cumulative compaction meets a delete version, BE can return
    `[E-2010] cumulative compaction meet delete version`, advance the
    cumulative point, and let base compaction handle the rowsets. In that
    path the cumulative success/failure timestamps may not change, so the
    old helper kept polling even after base compaction had completed and
    `run_status=false`.
    
    This patch treats `E-2010` plus cumulative point advancement plus a
    changed base success time as an equivalent completed cumulative
    delete-version path while still waiting when `run_status=true`. If
    `E-2010` advances the cumulative point but base success time has not
    changed yet, the helper keeps waiting even if the cumulative failure
    timestamp changed.
    
    ## Root Cause
    
    The case `compaction/test_compacation_with_delete.groovy` creates
    alternating data and delete rowsets, then calls
    `trigger_and_wait_compaction(tableName, "cumulative")`.
    
    In Cloud mode this can legally follow:
    
    1. cumulative compaction meets delete version and returns `E-2010`
    2. cumulative point advances
    3. base compaction handles the delete-version rowsets
    
    The helper only watched cumulative success/failure timestamp changes. In
    the failing log, base compaction completed in 448 ms, but the helper
    waited for 5 minutes because the cumulative timestamps did not change.
    
    ## Validation
    
    - `git diff --check`
    - `git diff --check origin/master..HEAD`
    - Local Groovy condition simulation:
    - `E-2010 + cumulative point advanced + base success time changed +
    run_status=false` exits wait
    - `E-2010 + cumulative point advanced` keeps waiting if base success
    time has not changed
    - `E-2010 + cumulative point advanced + cumulative failure time changed`
    still keeps waiting if base success time has not changed
    - `run_status=true` keeps waiting even if the
    delete-version/base-success condition is met
    - normal cumulative success timestamp change still exits wait when there
    is no delete-version handoff
    
    Cloud P0 rerun is still needed for final validation.
---
 regression-test/plugins/plugin_compaction.groovy | 29 +++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/regression-test/plugins/plugin_compaction.groovy 
b/regression-test/plugins/plugin_compaction.groovy
index e57523416b0..d8ee5951222 100644
--- a/regression-test/plugins/plugin_compaction.groovy
+++ b/regression-test/plugins/plugin_compaction.groovy
@@ -128,6 +128,16 @@ Suite.metaClass.trigger_and_wait_compaction = { String 
table_name, String compac
 
     // 3. wait all compaction finished
     def running = triggered_tablets.size() > 0
+    def toLongOrNull = { value ->
+        if (value == null) {
+            return null
+        }
+        try {
+            return value.toString().trim().toLong()
+        } catch (Throwable ignored) {
+            return null
+        }
+    }
     Awaitility.await().atMost(timeout_seconds, 
TimeUnit.SECONDS).pollInterval(1, TimeUnit.SECONDS).until(() -> {
         for (tablet in triggered_tablets) {
             def be_host = backendId_to_backendIP["${tablet.BackendId}"]
@@ -146,9 +156,26 @@ Suite.metaClass.trigger_and_wait_compaction = { String 
table_name, String compac
                 def tabletStatus = parseJson(stdout.trim())
                 def oldStatus = 
be_tablet_compaction_status.get("${be_host}-${tablet.TabletId}")
                 // last compaction success/failure time isn't updated, 
indicates compaction is not started(so we treat it as running and wait)
+                def handedOffToBaseCompactionAfterDeleteVersion = false
+                def completedByBaseCompactionAfterDeleteVersion = false
+                if (compaction_type == "cumulative") {
+                    def oldCumulativePoint = 
toLongOrNull(oldStatus["cumulative point"])
+                    def newCumulativePoint = 
toLongOrNull(tabletStatus["cumulative point"])
+                    def lastCumulativeStatus = "${tabletStatus["last 
cumulative status"]}".toLowerCase()
+                    def baseSuccessTimeChanged = oldStatus["last base success 
time"] != tabletStatus["last base success time"]
+                    // E-2010 advances the cumulative point and lets base 
compaction handle delete-version rowsets.
+                    handedOffToBaseCompactionAfterDeleteVersion = 
lastCumulativeStatus.contains("e-2010") &&
+                            oldCumulativePoint != null && newCumulativePoint 
!= null &&
+                            newCumulativePoint > oldCumulativePoint
+                    completedByBaseCompactionAfterDeleteVersion =
+                            handedOffToBaseCompactionAfterDeleteVersion && 
baseSuccessTimeChanged
+                }
                 def success_time_unchanged = (oldStatus["last 
${compaction_type} success time"] == tabletStatus["last ${compaction_type} 
success time"])
                 def failure_time_unchanged = (oldStatus["last 
${compaction_type} failure time"] == tabletStatus["last ${compaction_type} 
failure time"])
-                running = running || (success_time_unchanged && 
failure_time_unchanged)
+                def currentCompactionTimestampChanged = 
!success_time_unchanged || !failure_time_unchanged
+                def compactionFinished = 
completedByBaseCompactionAfterDeleteVersion ||
+                        (!handedOffToBaseCompactionAfterDeleteVersion && 
currentCompactionTimestampChanged)
+                running = running || !compactionFinished
                 if (running) {
                     logger.info("compaction is still running, be host: 
${be_host}, tablet id: ${tablet.TabletId}, run status: 
${compactionStatus.run_status}, old status: ${oldStatus}, new status: 
${tabletStatus}")
                     return false


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to