Re: [PR] fix: cache key generation [superset]

via GitHub Fri, 21 Nov 2025 14:02:20 -0800


Copilot commented on code in PR #36225:
URL: https://github.com/apache/superset/pull/36225#discussion_r2551075992



##########
superset/common/utils/query_cache_manager.py:
##########
@@ -158,6 +158,8 @@ def get(
 
         if cache_value := _cache[region].get(key):
             logger.debug("Cache key: %s", key)
+            # Log cache hit for debugging
+            logger.info("CACHE GET - Key: %s, Region: %s", key, region)

Review Comment:
   This `logger.info()` call executes on every cache hit in production. 
Consider using `logger.debug()` instead to reduce log volume, as this is 
debugging information. High-frequency INFO-level logging can negatively impact 
performance and increase log storage costs.
   ```suggestion
               logger.debug("CACHE GET - Key: %s, Region: %s", key, region)
   ```



##########
superset/utils/cache.py:
##########
@@ -39,7 +39,14 @@
 
 def generate_cache_key(values_dict: dict[str, Any], key_prefix: str = "") -> 
str:
     hash_str = md5_sha_from_dict(values_dict, default=json_int_dttm_ser)
-    return f"{key_prefix}{hash_str}"
+    cache_key = f"{key_prefix}{hash_str}"
+    # Log cache key generation for debugging
+    logger.debug(
+        "Cache key generated: %s from dict keys: %s",
+        cache_key,
+        list(values_dict.keys()),
+    )

Review Comment:
   The `list(values_dict.keys())` operation creates an unnecessary list object 
on every cache key generation. Consider logging only when debug level is 
enabled:
   ```python
   if logger.isEnabledFor(logging.DEBUG):
       logger.debug("Cache key generated: %s from dict keys: %s", cache_key, 
list(values_dict.keys()))
   ```
   This avoids the overhead of creating the list when debug logging is disabled.
   ```suggestion
       if logger.isEnabledFor(logging.DEBUG):
           logger.debug(
               "Cache key generated: %s from dict keys: %s",
               cache_key,
               list(values_dict.keys()),
           )
   ```



##########
superset/common/query_object.py:
##########
@@ -480,7 +481,16 @@ def cache_key(self, **extra: Any) -> str:  # noqa: C901
             # datasource or database do not exist
             pass
 
-        return md5_sha_from_dict(cache_dict, default=json_int_dttm_ser, 
ignore_nan=True)
+        cache_key = md5_sha_from_dict(
+            cache_dict, default=json_int_dttm_ser, ignore_nan=True
+        )
+        # Log QueryObject cache key generation for debugging
+        logger.info(
+            "QueryObject CACHE KEY generated: %s from dict with keys: %s",
+            cache_key,
+            sorted(cache_dict.keys()),

Review Comment:
   This `logger.info()` call with `sorted(cache_dict.keys())` executes on every 
cache key generation. Consider:
   1. Using `logger.debug()` instead of `logger.info()` 
   2. Removing the `sorted()` call which creates unnecessary overhead
   
   High-frequency INFO-level logging can negatively impact query performance.
   ```suggestion
           logger.debug(
               "QueryObject CACHE KEY generated: %s from dict with keys: %s",
               cache_key,
               list(cache_dict.keys()),
   ```



##########
superset/utils/cache.py:
##########
@@ -68,6 +75,14 @@ def set_and_log_cache(
         stats_logger = app.config["STATS_LOGGER"]
         stats_logger.incr("set_cache_key")
 
+        # Log cache key details for debugging
+        logger.info(
+            "CACHE SET - Key: %s, Datasource: %s, Timeout: %s",
+            cache_key,
+            datasource_uid,
+            timeout,
+        )

Review Comment:
   This `logger.info()` call will execute on every cache set operation in 
production. Consider:
   1. Using `logger.debug()` instead of `logger.info()` to reduce log volume, 
as this is debugging information
   2. Making this logging conditional on a debug flag or feature flag to avoid 
performance impact in production environments
   
   High-frequency logging at INFO level can impact performance and increase log 
storage costs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: cache key generation [superset]

Reply via email to