Copilot commented on code in PR #36225:
URL: https://github.com/apache/superset/pull/36225#discussion_r2551075992
##########
superset/common/utils/query_cache_manager.py:
##########
@@ -158,6 +158,8 @@ def get(
if cache_value := _cache[region].get(key):
logger.debug("Cache key: %s", key)
+ # Log cache hit for debugging
+ logger.info("CACHE GET - Key: %s, Region: %s", key, region)
Review Comment:
This `logger.info()` call executes on every cache hit in production.
Consider using `logger.debug()` instead to reduce log volume, as this is
debugging information. High-frequency INFO-level logging can negatively impact
performance and increase log storage costs.
```suggestion
logger.debug("CACHE GET - Key: %s, Region: %s", key, region)
```
##########
superset/utils/cache.py:
##########
@@ -39,7 +39,14 @@
def generate_cache_key(values_dict: dict[str, Any], key_prefix: str = "") ->
str:
hash_str = md5_sha_from_dict(values_dict, default=json_int_dttm_ser)
- return f"{key_prefix}{hash_str}"
+ cache_key = f"{key_prefix}{hash_str}"
+ # Log cache key generation for debugging
+ logger.debug(
+ "Cache key generated: %s from dict keys: %s",
+ cache_key,
+ list(values_dict.keys()),
+ )
Review Comment:
The `list(values_dict.keys())` operation creates an unnecessary list object
on every cache key generation. Consider logging only when debug level is
enabled:
```python
if logger.isEnabledFor(logging.DEBUG):
logger.debug("Cache key generated: %s from dict keys: %s", cache_key,
list(values_dict.keys()))
```
This avoids the overhead of creating the list when debug logging is disabled.
```suggestion
if logger.isEnabledFor(logging.DEBUG):
logger.debug(
"Cache key generated: %s from dict keys: %s",
cache_key,
list(values_dict.keys()),
)
```
##########
superset/common/query_object.py:
##########
@@ -480,7 +481,16 @@ def cache_key(self, **extra: Any) -> str: # noqa: C901
# datasource or database do not exist
pass
- return md5_sha_from_dict(cache_dict, default=json_int_dttm_ser,
ignore_nan=True)
+ cache_key = md5_sha_from_dict(
+ cache_dict, default=json_int_dttm_ser, ignore_nan=True
+ )
+ # Log QueryObject cache key generation for debugging
+ logger.info(
+ "QueryObject CACHE KEY generated: %s from dict with keys: %s",
+ cache_key,
+ sorted(cache_dict.keys()),
Review Comment:
This `logger.info()` call with `sorted(cache_dict.keys())` executes on every
cache key generation. Consider:
1. Using `logger.debug()` instead of `logger.info()`
2. Removing the `sorted()` call which creates unnecessary overhead
High-frequency INFO-level logging can negatively impact query performance.
```suggestion
logger.debug(
"QueryObject CACHE KEY generated: %s from dict with keys: %s",
cache_key,
list(cache_dict.keys()),
```
##########
superset/utils/cache.py:
##########
@@ -68,6 +75,14 @@ def set_and_log_cache(
stats_logger = app.config["STATS_LOGGER"]
stats_logger.incr("set_cache_key")
+ # Log cache key details for debugging
+ logger.info(
+ "CACHE SET - Key: %s, Datasource: %s, Timeout: %s",
+ cache_key,
+ datasource_uid,
+ timeout,
+ )
Review Comment:
This `logger.info()` call will execute on every cache set operation in
production. Consider:
1. Using `logger.debug()` instead of `logger.info()` to reduce log volume,
as this is debugging information
2. Making this logging conditional on a debug flag or feature flag to avoid
performance impact in production environments
High-frequency logging at INFO level can impact performance and increase log
storage costs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]