codeant-ai-for-open-source[bot] commented on code in PR #37014:
URL: https://github.com/apache/superset/pull/37014#discussion_r2716290317


##########
superset/models/helpers.py:
##########
@@ -892,6 +892,16 @@ def get_extra_cache_keys(self, query_obj: QueryObjectDict) 
-> list[Hashable]:
     def get_template_processor(self, **kwargs: Any) -> BaseTemplateProcessor:
         raise NotImplementedError()
 
+    def get_dataset_timezone(self) -> str | None:
+        """
+        Get the timezone configured for this dataset from the extra JSON field.
+
+        Returns an IANA timezone name (e.g., "Europe/Berlin", 
"America/New_York")
+        or None if not configured.
+
+        """
+        return self.extra_dict.get("timezone")

Review Comment:
   **Suggestion:** The new `get_dataset_timezone` uses `self.extra_dict` which 
doesn't exist on the mixin or most datasource classes and will raise 
AttributeError at runtime; use available accessors (`get_extra_dict` or the 
`extra` property) to read the JSON extras safely. [null pointer]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Queries error when dataset timezone reading fails.
   - ❌ Dashboard time-based queries return HTTP 500.
   - ⚠️ Timezone conversion feature becomes non-functional.
   ```
   </details>
   
   ```suggestion
           # Prefer dataset helper if available, otherwise fall back to `extra` 
property.
           if hasattr(self, "get_extra_dict") and callable(getattr(self, 
"get_extra_dict")):
               extras = self.get_extra_dict() or {}
           else:
               extras = getattr(self, "extra", {}) or {}
           return extras.get("timezone")
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger a query that executes ExploreMixin.normalize_df in 
superset/models/helpers.py
   (hunk around lines 1247-1282). The normalize_df implementation calls 
dataset_timezone =
   self.get_dataset_timezone() at superset/models/helpers.py:1270.
   
   2. Execution enters get_dataset_timezone at 
superset/models/helpers.py:895-903 which
   returns self.extra_dict.get("timezone"). The attribute `extra_dict` does not 
exist on
   ExploreMixin or ExtraJSONMixin in this file.
   
   3. Python raises AttributeError("'ExploreMixin' object has no attribute 
'extra_dict'") at
   superset/models/helpers.py:903, propagating back to the query path (e.g., 
get_query_result
   -> normalize_df) and causing the request to fail.
   
   4. The same broken accessor is also invoked from get_time_filter
   (superset/models/helpers.py:2406) when building time filters; any call path 
that resolves
   to get_dataset_timezone will reproduce the AttributeError. This is not a 
stylistic issue —
   it's a real missing attribute that will raise at runtime.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 903:903
   **Comment:**
        *Null Pointer: The new `get_dataset_timezone` uses `self.extra_dict` 
which doesn't exist on the mixin or most datasource classes and will raise 
AttributeError at runtime; use available accessors (`get_extra_dict` or the 
`extra` property) to read the JSON extras safely.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/utils/core.py:
##########
@@ -1951,8 +1954,28 @@ def normalize_dttm_col(
 
         _process_datetime_column(df, _col)
 
-        if _col.offset:
+        if _col.timezone:
+            try:
+                tz = pytz.timezone(_col.timezone)
+                # Data is stored in UTC, convert to the dataset's configured 
timezone
+                # First make the datetime UTC-aware, then convert to target 
timezone
+                series = df[_col.col_label]
+                if not series.empty and series.notna().any():
+                    # Convert UTC to target timezone
+                    df[_col.col_label] = (
+                        series.dt.tz_localize("UTC")
+                        .dt.tz_convert(tz)
+                        .dt.tz_localize(None)  # Remove timezone info for 
display
+                    )
+            except pytz.UnknownTimeZoneError:
+                logging.warning(
+                    "Unknown timezone '%s', falling back to offset", 
_col.timezone
+                )
+                if _col.offset:
+                    df[_col.col_label] += timedelta(hours=_col.offset)
+        elif _col.offset:
             df[_col.col_label] += timedelta(hours=_col.offset)

Review Comment:
   **Suggestion:** Offset handling is incorrect: the code adds the configured 
`offset` hours to the datetime (using `+= timedelta(hours=_col.offset)`) which 
moves times in the wrong direction when converting local boundaries to UTC; 
additionally, the code treats offset `0` as falsy so a valid zero offset is 
ignored. Change the logic to subtract the offset (and check `is not None`) so 
offsets of `0` are handled. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Dashboard time filters produce incorrect UTC boundaries.
   - ❌ Queries to /data return wrong time ranges.
   - ⚠️ Charts and aggregations misaligned by timezone offset.
   - ⚠️ Dataset timezone/offset settings produce incorrect results.
   ```
   </details>
   
   ```suggestion
                   if _col.offset is not None:
                       # Subtract offset to convert local time to UTC (offset 
is hours ahead)
                       df[_col.col_label] -= timedelta(hours=_col.offset)
           elif _col.offset is not None:
               # Subtract configured offset (support zero offset)
               df[_col.col_label] -= timedelta(hours=_col.offset)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open a Python REPL or test and import normalize_dttm_col from 
superset/utils/core.py
   (function begins at file line ~1934 where `def normalize_dttm_col(` is 
defined).
   
   2. Construct a DataFrame with a datetime column named "__timestamp", e.g. in 
code at
   runtime by:
   
      - df = pd.DataFrame({"__timestamp": pd.to_datetime(["2026-01-09 
00:00:00"])})
   
   3. Create a DateColumn instance with offset set (example using class defined 
around lines
   ~1847 in superset/utils/core.py):
   
      - from superset.utils.core import DateColumn
   
      - col = DateColumn(col_label="__timestamp", offset=6)
   
   4. Call normalize_dttm_col(df, (col,)) (function at superset/utils/core.py: 
~1934).
   Observe resulting df["__timestamp"] has been increased by +6 hours (current 
code does +=
   timedelta), whereas the PR description and intended behavior expect the 
local midnight
   (UTC+6) to be converted to UTC by subtracting 6 hours. Also observe that 
setting offset=0
   (DateColumn(offset=0)) will be ignored by current code because `if 
_col.offset:` treats 0
   as falsy, so no adjustment is applied.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/core.py
   **Line:** 1974:1977
   **Comment:**
        *Logic Error: Offset handling is incorrect: the code adds the 
configured `offset` hours to the datetime (using `+= 
timedelta(hours=_col.offset)`) which moves times in the wrong direction when 
converting local boundaries to UTC; additionally, the code treats offset `0` as 
falsy so a valid zero offset is ignored. Change the logic to subtract the 
offset (and check `is not None`) so offsets of `0` are handled.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to