Copilot commented on code in PR #61945:
URL: https://github.com/apache/doris/pull/61945#discussion_r3015021033


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/CloudTabletStatMgr.java:
##########
@@ -394,4 +548,78 @@ private void 
filterTopTableStatsByDataSize(List<OlapTable.Statistics> newCloudTa
         }
         this.cloudTableStatsList = new ArrayList<>(topStats);
     }
+
+    public void addActiveTablets(List<Long> tabletIds) {
+        if (Config.cloud_get_tablet_stats_version == 1 || tabletIds == null || 
tabletIds.isEmpty()) {
+            return;
+        }
+        activeTablets.addAll(tabletIds);
+    }
+
+    // master FE send update tablet stats rpc to other FEs
+    private void pushTabletStats(GetTabletStatsResponse response) {
+        List<Frontend> frontends = getFrontends();
+        if (frontends == null || frontends.isEmpty()) {
+            return;
+        }
+        TSyncCloudTabletStatsRequest request = new 
TSyncCloudTabletStatsRequest();
+        request.setTabletStatsPb(ByteBuffer.wrap(response.toByteArray()));
+        for (Frontend fe : frontends) {

Review Comment:
   `pushTabletStats` sets `tabletStatsPb` using `ByteBuffer.wrap(...)`, but the 
receiving side (`FrontendServiceImpl.syncCloudTabletStats`) reads this field as 
a `byte[]` (`request.getTabletStatsPb()`). This looks like a type mismatch that 
will either not compile or will serialize incorrectly. Set the field using the 
expected `byte[]` payload (e.g., `response.toByteArray()`) and drop the 
`ByteBuffer` usage/import here.



##########
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:
##########
@@ -69,6 +71,20 @@ public class CloudReplica extends Replica implements 
GsonPostProcessable {
     private long indexId = -1;
     @SerializedName(value = "idx")
     private long idx = -1;
+    // last time to get tablet stats
+    @Getter
+    @Setter
+    long lastGetTabletStatsTime = 0;
+    /**
+     * The index of {@link 
org.apache.doris.catalog.CloudTabletStatMgr#DEFAULT_INTERVAL_LADDER_MS} array.
+     * Used to control the interval of getting tablet stats.
+     * When get tablet stats:
+     * if the stats is unchanged, will update this index to next value to get 
stats less frequently;
+     * if the stats is changed, will update this index to 0 to get stats more 
frequently.
+     */
+    @Getter
+    @Setter
+    int statsIntervalIndex = 0;

Review Comment:
   New fields `lastGetTabletStatsTime` and `statsIntervalIndex` are 
package-private and lack `@SerializedName`, while `CloudReplica` is 
Gson-persisted and other fields use short `@SerializedName` keys. Make these 
fields `private` and add appropriate `@SerializedName` (with alternates if 
needed) to keep metadata serialization stable/backward-compatible.



##########
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java:
##########
@@ -69,6 +71,20 @@ public class CloudReplica extends Replica implements 
GsonPostProcessable {
     private long indexId = -1;
     @SerializedName(value = "idx")
     private long idx = -1;
+    // last time to get tablet stats
+    @Getter
+    @Setter
+    long lastGetTabletStatsTime = 0;
+    /**
+     * The index of {@link 
org.apache.doris.catalog.CloudTabletStatMgr#DEFAULT_INTERVAL_LADDER_MS} array.
+     * Used to control the interval of getting tablet stats.
+     * When get tablet stats:
+     * if the stats is unchanged, will update this index to next value to get 
stats less frequently;
+     * if the stats is changed, will update this index to 0 to get stats more 
frequently.
+     */

Review Comment:
   The Javadoc links to `CloudTabletStatMgr#DEFAULT_INTERVAL_LADDER_MS`, but 
that constant is `private` in `CloudTabletStatMgr`, so the Javadoc reference 
will be broken. Either make the constant accessible (public/protected) or 
remove/adjust the link text.



##########
fe/fe-core/src/main/java/org/apache/doris/catalog/CloudTabletStatMgr.java:
##########
@@ -55,20 +77,81 @@ public class CloudTabletStatMgr extends MasterDaemon {
     private volatile List<OlapTable.Statistics> cloudTableStatsList = new 
ArrayList<>();
 
     private static final ExecutorService GET_TABLET_STATS_THREAD_POOL = 
Executors.newFixedThreadPool(
-            Config.max_get_tablet_stat_task_threads_num);
+            Config.max_get_tablet_stat_task_threads_num,
+            new 
ThreadFactoryBuilder().setNameFormat("get-tablet-stats-%d").setDaemon(true).build());
+    // Master: send tablet stats to followers and observers
+    // Follower and observer: receive tablet stats from master
+    private static final ExecutorService SYNC_TABLET_STATS_THREAD_POOL = 
Executors.newFixedThreadPool(
+            Config.cloud_sync_tablet_stats_task_threads_num,
+            new 
ThreadFactoryBuilder().setNameFormat("sync-tablet-stats-%d").setDaemon(true).build());
+    private Set<Long> activeTablets = ConcurrentHashMap.newKeySet();
+
+    /**
+     * Interval ladder in milliseconds: 1m, 5m, 10m, 30m, 2h, 6h, 12h, 3d, 
infinite.
+     * Tablets with changing stats stay at lower intervals; stable tablets 
move to higher intervals.
+     */
+    private static final long[] DEFAULT_INTERVAL_LADDER_MS = {
+            TimeUnit.MINUTES.toMillis(1),    // 1 minute
+            TimeUnit.MINUTES.toMillis(5),    // 5 minutes
+            TimeUnit.MINUTES.toMillis(10),   // 10 minutes
+            TimeUnit.MINUTES.toMillis(30),   // 30 minutes
+            TimeUnit.HOURS.toMillis(2),      // 2 hours
+            TimeUnit.HOURS.toMillis(6),      // 6 hours
+            TimeUnit.HOURS.toMillis(12),     // 12 hours
+            TimeUnit.DAYS.toMillis(3),       // 3 days
+            Long.MAX_VALUE                   // infinite (never auto-fetch)
+    };
 
     public CloudTabletStatMgr() {
         super("cloud tablet stat mgr", 
Config.tablet_stat_update_interval_second * 1000);
     }
 
     @Override
     protected void runAfterCatalogReady() {
-        LOG.info("cloud tablet stat begin");
-        List<Long> dbIds = getAllTabletStats();
+        int version = Config.cloud_get_tablet_stats_version;
+        LOG.info("cloud tablet stat begin with version: {}", version);
+
+        // version1: get all tablet stats
+        if (version == 1) {
+            this.activeTablets.clear();
+            List<Long> dbIds = getAllTabletStats(null);
+            updateStatInfo(dbIds);
+            return;
+        }
+
+        // version2: get stats for active tablets
+        Set<Long> copiedTablets = new HashSet<>(activeTablets);
+        activeTablets.removeAll(copiedTablets);
+        getActiveTabletStats(copiedTablets);

Review Comment:
   `activeTablets` is cleared (`removeAll`) before 
`getActiveTabletStats(copiedTablets)` runs. If fetching tablet stats fails (RPC 
error / task failure), those tablet IDs are still removed and won’t be retried 
until they become active again, which can leave stats stale. Consider only 
removing IDs after a successful fetch, or re-adding failed IDs back to 
`activeTablets`.
   ```suggestion
           getActiveTabletStats(copiedTablets);
           activeTablets.removeAll(copiedTablets);
   ```



##########
fe/fe-common/src/main/java/org/apache/doris/common/Config.java:
##########
@@ -3554,6 +3553,13 @@ public static int metaServiceRpcRetryTimes() {
         "Maximal concurrent num of get tablet stat job."})
     public static int max_get_tablet_stat_task_threads_num = 4;
 
+    @ConfField(mutable = true, description = {"Version of getting tablet stats 
in cloud mode. "
+            + "Version 1: get all tablets; Version 2: get active and interval 
expired tablets"})
+    public static int cloud_get_tablet_stats_version = 2;
+
+    @ConfField(description = {"Maximum concurrent number of get tablet stat 
jobs."})

Review Comment:
   Config description for `cloud_sync_tablet_stats_task_threads_num` says "get 
tablet stat jobs", but this setting controls the concurrency of syncing/pushing 
tablet stats between FEs. Please update the description to match the actual 
behavior to avoid operator confusion.
   ```suggestion
       @ConfField(description = {"存算分离模式下 FE 之间同步 tablet 统计信息任务的最大并发数。",
               "Maximum concurrent number of syncing tablet stats between 
FEs."})
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java:
##########
@@ -4543,16 +4546,28 @@ public TStatus 
reportCommitTxnResult(TReportCommitTxnResultRequest request) thro
             return new TStatus(TStatusCode.NOT_MASTER);
         }
 
-        LOG.info("receive load stats report request: {}, backend: {}, dbId: 
{}, txnId: {}, label: {}",
-                request, clientAddr, request.getDbId(), request.getTxnId(), 
request.getLabel());
+        if (LOG.isDebugEnabled()) {
+            LOG.debug("receive load stats report from backend: {}, dbId: {}, 
txnId: {}, label: {}, tabletIds: {}",
+                    clientAddr, request.getDbId(), request.getTxnId(), 
request.getLabel(), request.getTabletIds());
+        }
 
         try {
-            byte[] receivedProtobufBytes = request.getPayload();
-            if (receivedProtobufBytes == null || receivedProtobufBytes.length 
<= 0) {
-                return new TStatus(TStatusCode.INVALID_ARGUMENT);
+            List<Long> tabletIds = request.isSetTabletIds() ? 
request.getTabletIds() : Collections.emptyList();
+            if (LOG.isDebugEnabled()) {
+                LOG.debug("force sync tablet stats for txnId: {}, tabletNum: 
{}, tabletIds: {}", request.txnId,

Review Comment:
   This debug log references `request.txnId` directly instead of using the 
Thrift accessor (`getTxnId()`), which is inconsistent with the rest of the 
method and can be misleading when the field is unset/defaulted. Prefer 
`request.getTxnId()` for consistency.
   ```suggestion
                   LOG.debug("force sync tablet stats for txnId: {}, tabletNum: 
{}, tabletIds: {}", request.getTxnId(),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to