YuriyKrasilnikov commented on PR #37516:
URL: https://github.com/apache/superset/pull/37516#issuecomment-4763022674

   Thanks again for the rebase and for pointing out the conflict areas.
   
   I re-reviewed the PR from current master through the rebased branch, 
especially the `get_df_payload()` area where the BigQuery warning/memory-limit 
handling and the `applied_filter_columns` cache-miss logic landed near the 
timing code.
   
   ## What changed
   
   I kept `CHART_DATA_INCLUDE_TIMING` as a config option with default `False` 
for now.
   
   I agree this means the API response does not include timing out of the box, 
so it does not fully answer #37507 by default. But after looking at the 
security/API-contract concern again, I think default-on timing should be an 
explicit maintainer decision. The current behavior is:
   
   - timing is still always collected internally
   - `STATS_LOGGER` metrics and slow-query logging still work regardless of the 
config
   - API response timing stays opt-in via `CHART_DATA_INCLUDE_TIMING = True`
   
   If the preferred direction is default `True`, or moving this behind a 
feature flag instead of config, I can switch it.
   
   ## Root cause / fix
   
   The main issue I found is that timing was being finalized too early in 
`get_df_payload()`. That missed later response work like `_get_full()` 
serialization, `query_context.get_data()`, and the `POST_PROCESSED` path 
through `apply_client_processing()`.
   
   The fix now carries timing as internal query state until command/response 
finalization, then strips the internal key before returning the public 
response. This makes `result_processing_ms` and `total_ms` include the 
response-shaping work.
   
   ## Response consistency
   
   I also fixed the `_get_full` inconsistency you called out:
   
   - when response timing is disabled, `RESULTS` no longer returns `timing: 
null`
   - `FULL` and `RESULTS` both omit timing unless the config is enabled
   - `db_execution_ms` is explicitly `null` on cache hits
   
   Separately, CodeAnt pointed out a related `RESULTS` flow mismatch around 
`is_cached`: `_get_full()` reduces the payload and does not carry top-level 
`is_cached`, so the timing finalizer could no longer derive the cache state 
from the reduced result. I fixed that by carrying `is_cached` inside the 
internal timing state before reduction, and keeping public `timing.is_cached` 
as a strict boolean.
   
   Also addressed the older Bito notes by removing the no-op `db_execution_ms` 
assignment and documenting response-only `timing.is_cached`.
   
   One related issue I found while reviewing the conflict area: when an old 
cached payload is invalidated because it has no `applied_filter_columns`, the 
code re-runs the query but could still keep cache-hit metadata like 
`is_cached=True` / `cached_dttm`. I cleared that metadata in the invalidation 
path so the reloaded result is reported as a miss.
   
   Added/updated tests for full/results responses, deferred finalization, cache 
hits/misses, stale-cache invalidation, and the disabled timing response path.
   
   Latest push is `945d7048a8`.
   
   P.S. While reviewing the `POST_PROCESSED` response path, I noticed a 
possible adjacent CSV issue that looks separate from this timing PR.
   
   The path is:
   
   - `QueryContextProcessor.get_data()` can return CSV data as `bytes` after 
applying `CSV_EXPORT["encoding"]`
   - `_send_chart_response()` calls `apply_client_processing()` before the 
final response is built for `POST_PROCESSED`
   - `apply_client_processing()` then reads CSV data with `StringIO(data)`, 
which expects `str`, not `bytes`
   
   <details>
   <summary>Relevant code paths</summary>
   
   ```python
   # superset/common/query_context_processor.py
   if self._query_context.result_format == ChartDataResultFormat.CSV:
       result = csv.df_to_escaped_csv(
           df, index=include_index, **current_app.config["CSV_EXPORT"]
       )
       result = result.encode(
           current_app.config["CSV_EXPORT"].get("encoding", "utf-8")
       )
   ```
   
   ```python
   # superset/charts/data/api.py
   if result_type == ChartDataResultType.POST_PROCESSED:
       result = apply_client_processing(result, form_data, datasource)
   ```
   
   ```python
   # superset/charts/client_processing.py
   elif query["result_format"] == ChartDataResultFormat.CSV:
       df = pd.read_csv(
           StringIO(data),
           keep_default_na=na_values is None,
           na_values=na_values,
       )
   ```
   
   </details>
   
   I found #36682 / #38616 in the same area, but those seem focused on 
`CSV_EXPORT` delimiter/decimal handling rather than the exact `bytes` vs 
`StringIO` case. I did not change it here to keep this PR scoped to timing, but 
it may be worth checking separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to