Copilot commented on code in PR #63019:
URL: https://github.com/apache/doris/pull/63019#discussion_r3194132981


##########
regression-test/suites/cloud_p0/cache/multi_cluster/warm_up/cluster/test_warm_up_cluster_event_cancel_expired.groovy:
##########
@@ -0,0 +1,203 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import org.apache.doris.regression.suite.ClusterOptions
+import groovy.json.JsonSlurper
+
+// Covers a two-part bug on the event-driven warm-up path:
+//   1. BE side (cloud_warm_up_manager.cpp): `st.is<CANCELED>()` used the
+//      proto enum value PCacheStatus::CANCELED=9 instead of
+//      ErrorCode::CANCELLED=1, so BE never cleared a cancelled job from
+//      `_tablet_replica_cache`.
+//   2. FE side (FrontendServiceImpl.getTabletReplicaInfos): when the job
+//      had been removed from `cloudWarmUpJobs` (after
+//      history_cloud_warm_up_job_keep_max_second), the branch still
+//      called `job.getJobId()` on a null reference, throwing NPE to BE.
+//
+// After the fix, once a warm-up job is cancelled BE must drop it from
+// its cache on the next RPC (via the CANCELLED TStatus), so that later
+// expiry removal on FE never receives follow-up requests with the dead
+// job_id and no NPE path is exercised.
+suite('test_warm_up_cluster_event_cancel_expired', 'docker') {
+    def options = new ClusterOptions()
+    options.feConfigs += [
+        'cloud_cluster_check_interval_second=1',
+        // Keep expiry small so FE removes the cancelled job quickly.
+        'history_cloud_warm_up_job_keep_max_second=30',
+    ]
+    options.beConfigs += [
+        'file_cache_enter_disk_resource_limit_mode_percent=99',
+        'enable_evict_file_cache_in_advance=false',
+        'file_cache_background_monitor_interval_ms=1000',
+    ]
+    options.cloudMode = true
+
+    def clearFileCache = { ip, port ->
+        def url = "http://${ip}:${port}/api/file_cache?op=clear&sync=true";
+        def response = new URL(url).text
+        def json = new JsonSlurper().parseText(response)
+        if (json.status != "OK") {
+            throw new RuntimeException("Clear cache on ${ip}:${port} failed: 
${json.status}")
+        }

Review Comment:
   `clearFileCache` always uses a plain `http://` URL and `new URL(url).text`, 
so this suite will fail in TLS-enabled runs (the framework expects `enableTLS` 
to switch to HTTPS and use the configured trust/key material). Please use the 
regression framework HTTP helpers (e.g., `http_client`/`curl`/`Http.GET`) that 
honor `enableTLS`, or at least switch to `https://` when TLS is enabled.



##########
regression-test/suites/cloud_p0/cache/multi_cluster/warm_up/cluster/test_warm_up_cluster_event_cancel_expired.groovy:
##########
@@ -0,0 +1,203 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import org.apache.doris.regression.suite.ClusterOptions
+import groovy.json.JsonSlurper
+
+// Covers a two-part bug on the event-driven warm-up path:
+//   1. BE side (cloud_warm_up_manager.cpp): `st.is<CANCELED>()` used the
+//      proto enum value PCacheStatus::CANCELED=9 instead of
+//      ErrorCode::CANCELLED=1, so BE never cleared a cancelled job from
+//      `_tablet_replica_cache`.
+//   2. FE side (FrontendServiceImpl.getTabletReplicaInfos): when the job
+//      had been removed from `cloudWarmUpJobs` (after
+//      history_cloud_warm_up_job_keep_max_second), the branch still
+//      called `job.getJobId()` on a null reference, throwing NPE to BE.
+//
+// After the fix, once a warm-up job is cancelled BE must drop it from
+// its cache on the next RPC (via the CANCELLED TStatus), so that later
+// expiry removal on FE never receives follow-up requests with the dead
+// job_id and no NPE path is exercised.
+suite('test_warm_up_cluster_event_cancel_expired', 'docker') {
+    def options = new ClusterOptions()
+    options.feConfigs += [
+        'cloud_cluster_check_interval_second=1',
+        // Keep expiry small so FE removes the cancelled job quickly.
+        'history_cloud_warm_up_job_keep_max_second=30',
+    ]
+    options.beConfigs += [
+        'file_cache_enter_disk_resource_limit_mode_percent=99',
+        'enable_evict_file_cache_in_advance=false',
+        'file_cache_background_monitor_interval_ms=1000',
+    ]
+    options.cloudMode = true
+
+    def clearFileCache = { ip, port ->
+        def url = "http://${ip}:${port}/api/file_cache?op=clear&sync=true";
+        def response = new URL(url).text
+        def json = new JsonSlurper().parseText(response)
+        if (json.status != "OK") {
+            throw new RuntimeException("Clear cache on ${ip}:${port} failed: 
${json.status}")
+        }
+    }
+
+    def clearFileCacheOnAllBackends = {
+        def backends = sql """SHOW BACKENDS"""
+        for (be in backends) {
+            clearFileCache(be[1], be[4])
+        }
+        sleep(10000)
+    }
+
+    def getBrpcMetrics = { ip, port, name ->
+        def url = "http://${ip}:${port}/brpc_metrics";
+        if 
((context.config.otherConfigs.get("enableTLS")?.toString()?.equalsIgnoreCase("true"))
 ?: false) {
+            url = url.replace("http://";, "https://";) + " --cert " + 
context.config.otherConfigs.get("trustCert") + " --cacert " + 
context.config.otherConfigs.get("trustCACert") + " --key " + 
context.config.otherConfigs.get("trustCAKey")
+        }
+        def metrics = new URL(url).text

Review Comment:
   When `enableTLS` is true this builds `url` by appending curl CLI flags 
("--cert ...") and then passes it to `new URL(url).text`. That produces an 
invalid URL (spaces/extra args) and will throw at runtime. Use 
`curl()`/`http_client()` (which accept TLS cert settings) to fetch 
`/brpc_metrics`, or keep `url` as a valid URL and handle TLS via the framework 
SSL configuration instead of embedding CLI args.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to