shuke987 opened a new pull request, #64515:
URL: https://github.com/apache/doris/pull/64515

   ## Problem
   `test_delete_bitmap_metrics` is flaky on the branch-4.1 P0 regression. It 
reads the per-replica aggregated delete-bitmap cache 
(`/api/delete_bitmap/count_agg_cache`) and asserts `delete_bitmap_count == 8` 
on **every** replica of the tablet. But that agg cache is populated **lazily, 
only on the replica that actually served a query**. On a multi-replica cluster 
(`force_olap_table_replication_num`), the `qt_sql` select before the loop warms 
only one replica, so the other replicas still report `0` → the assertion fails. 
Which replica serves the query is non-deterministic → flaky.
   
   ## Fix
   Before the per-replica assertions, warm every replica by pinning the read to 
each replica ordinal (`use_fix_replica`) and running a select, so each 
replica's agg cache is populated. The assertions themselves are unchanged.
   
   ## Verification
   Reproduced and verified directly on a branch-4.1 cluster (force-3 replicas) 
via `count_agg_cache`:
   - before any select: all replicas report agg=0
   - after one default select: only the serving replica reports 8 (others 0)
   - after warming all replicas: all report 8
   
   The suite passes with the fix.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to