sadpandajoe commented on code in PR #34513:
URL: https://github.com/apache/superset/pull/34513#discussion_r2801632065
##########
tests/unit_tests/db_engine_specs/test_postgres.py:
##########
@@ -280,3 +280,34 @@ def quote_table(table: Table, dialect: Dialect) -> str:
LIMIT :param_1
""".strip()
)
+
+
+def test_interval_type_mutator() -> None:
+ """
+ DB Eng Specs (postgres): Test INTERVAL type mutator
+
+ INTERVAL values are converted to milliseconds so users can apply
+ the built-in "DURATION" number format for human-readable display.
+ """
+ mutator = spec.column_type_mutators[INTERVAL]
+
+ # Test timedelta conversion (most common case from psycopg2)
+ # Result is in milliseconds for compatibility with DURATION formatter
+ td = timedelta(days=1, hours=2, minutes=30, seconds=45)
+ assert mutator(td) == 95445000.0 # Total ms: (1*86400 + 2*3600 + 30*60 +
45) * 1000
+
+ # Test numeric values (assumed to be seconds) are converted to milliseconds
+ assert mutator(12345) == 12345000.0
+ assert mutator(123.45) == 123450.0
+
+ # Test None returns 0 (for aggregations)
+ assert mutator(None) == 0
+
+ # Test string values pass through unchanged
+ # (PostgreSQL may return string representations in some cases)
+ assert mutator("1 day 02:30:45") == "1 day 02:30:45"
+ assert mutator("P1DT2H30M45S") == "P1DT2H30M45S" # ISO 8601 duration
+
+ # Test other types pass through unchanged
+ assert mutator([1, 2, 3]) == [1, 2, 3]
+ assert mutator({"days": 1}) == {"days": 1}
Review Comment:
Good coverage of the lambda itself. A few additions that would help catch
integration issues and edge cases:
**1. Integration test via `fetch_data`** (follows the pattern in
`test_mysql.py:253`)
This verifies the full wiring: `cursor.description` → type lookup → mutator
application.
```python
@pytest.mark.parametrize(
"data,description,expected_result",
[
(
[(timedelta(hours=1, minutes=30),)],
[("duration", "interval")],
[(5400000.0,)], # 1.5 hours in ms
),
(
[(None,)],
[("duration", "interval")],
[(None,)], # NULLs preserved
),
(
[(timedelta(hours=1), "label")],
[("duration", "interval"), ("name", "varchar(50)")],
[(3600000.0, "label")], # Only INTERVAL column mutated
),
],
)
def test_interval_column_type_mutator(
data: list[tuple[Any, ...]],
description: list[tuple[str, str]],
expected_result: list[tuple[Any, ...]],
) -> None:
from unittest.mock import Mock
cursor = Mock()
cursor.description = description
cursor.fetchall.return_value = data
assert spec.fetch_data(cursor) == expected_result
```
**2. Column spec test** — verify INTERVAL is classified as NUMERIC:
```python
# Add to the existing test_get_column_spec parametrize list:
("INTERVAL", INTERVAL, None, GenericDataType.NUMERIC, False),
```
**3. Edge cases for the mutator:**
```python
# Zero duration
assert mutator(timedelta(0)) == 0.0
# Negative interval
assert mutator(timedelta(days=-1)) == -86400000.0
# Bool should not be treated as numeric
assert mutator(True) is None
assert mutator(False) is None
```
##########
superset/db_engine_specs/postgres.py:
##########
@@ -489,8 +490,30 @@ class PostgresEngineSpec(BasicParametersMixin,
PostgresBaseEngineSpec):
ENUM(),
GenericDataType.STRING,
),
+ (
+ re.compile(r"^interval", re.IGNORECASE),
+ INTERVAL(),
+ GenericDataType.NUMERIC,
+ ),
)
+ # PostgreSQL INTERVAL values need normalization for chart rendering.
+ # psycopg2 returns timedelta objects which we convert to milliseconds for
+ # numeric operations in bar/pie charts. Using milliseconds allows users to
+ # apply the built-in "DURATION" number format for human-readable display
+ # (e.g., "1d 2h 30m 45s"). String representations pass through unchanged.
+ column_type_mutators: dict[types.TypeEngine, Callable[[Any], Any]] = {
+ INTERVAL: lambda v: (
+ v.total_seconds() * 1000
+ if hasattr(v, "total_seconds")
+ else float(v) * 1000
+ if isinstance(v, (int, float))
+ else 0
+ if v is None
+ else v # Pass through string representations and other types
+ ),
+ }
Review Comment:
A few things in this lambda worth addressing:
**1. `None` → `0` changes NULL semantics**
NULL INTERVAL means "unknown/missing" — converting to `0` makes it
indistinguishable from an actual zero-duration interval. It also means AVG
aggregations on the frontend would include false zeros instead of excluding
missing data. Returning `None` preserves the correct semantics.
**2. `bool` leaks through the numeric branch**
In Python, `bool` is a subclass of `int`, so `isinstance(True, (int,
float))` is `True`. This means `True` → `1000.0` and `False` → `0.0`, which
probably isn't intended for an INTERVAL column.
**3. String/other pass-through creates mixed-type NUMERIC columns**
Since the column type mapping above declares INTERVAL as
`GenericDataType.NUMERIC`, passing through strings/lists/dicts unchanged could
produce mixed-type columns that break chart rendering on the frontend. Might be
safer to return `None` for anything that can't be converted to a number.
Here's a version that addresses all three:
```python
def _normalize_interval(v: Any) -> Any:
"""Convert PostgreSQL INTERVAL values to milliseconds for the DURATION
formatter."""
if v is None:
return None
if hasattr(v, "total_seconds"):
return v.total_seconds() * 1000
if isinstance(v, (int, float)) and not isinstance(v, bool):
return float(v) * 1000
return None # Can't convert to numeric — treat as missing
column_type_mutators: dict[types.TypeEngine, Callable[[Any], Any]] = {
INTERVAL: _normalize_interval,
}
```
This also makes it easier to test individual branches and add edge cases
later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]