codeant-ai-for-open-source[bot] commented on code in PR #36680: URL: https://github.com/apache/superset/pull/36680#discussion_r2624431835
########## superset/migrations/versions/2025-12-16_12-00_f5b5f88d8526_fix_form_data_string_in_query_context.py: ########## @@ -0,0 +1,108 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""fix_form_data_string_in_query_context + +Revision ID: f5b5f88d8526 +Revises: a9c01ec10479 +Create Date: 2025-12-16 12:00:00.000000 + +""" + +import logging + +from alembic import op +from sqlalchemy import Column, Integer, String, Text +from sqlalchemy.ext.declarative import declarative_base + +from superset import db +from superset.migrations.shared.utils import paginated_update +from superset.utils import json + +# revision identifiers, used by Alembic. +revision = "f5b5f88d8526" +down_revision = "a9c01ec10479" + +Base = declarative_base() +logger = logging.getLogger(__name__) + +# Viz types that have migrations that were going through the bug +MIGRATED_VIZ_TYPES = [ + "treemap_v2", + "pivot_table_v2", + "mixed_timeseries", + "sunburst_v2", + "echarts_timeseries_line", + "echarts_timeseries_smooth", + "echarts_timeseries_step", + "echarts_area", + "echarts_timeseries_bar", + "bubble_v2", + "heatmap_v2", + "histogram_v2", + "sankey_v2", +] + + +class Slice(Base): + __tablename__ = "slices" + + id = Column(Integer, primary_key=True) + viz_type = Column(String(250)) + query_context = Column(Text) + + +def upgrade(): + """ + Fix charts where form_data in query_context was stored as a JSON string + instead of a dict during chart import migration. + """ + bind = op.get_bind() + session = db.Session(bind=bind) + + for slc in paginated_update( + session.query(Slice).filter( + Slice.viz_type.in_(MIGRATED_VIZ_TYPES), + Slice.query_context.isnot(None), + ) + ): + try: + query_context = json.loads(slc.query_context) + form_data = query_context.get("form_data") Review Comment: **Suggestion:** The code assumes that the decoded `query_context` is always a dict and immediately calls `.get`, but if the column contains valid non-object JSON (e.g. a string or list), `json.loads` will succeed and `.get` will raise an AttributeError, causing the migration to skip that slice via the broad `except` and leave it in a broken state; adding an explicit type check for a dict before accessing `.get` avoids this runtime type error and ensures only well-formed contexts are processed. [type error] **Severity Level:** Minor ⚠️ ```suggestion if not isinstance(query_context, dict): logger.warning( "Unexpected query_context format for slice %s, skipping", slc.id ) continue ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The current code assumes json.loads(...) returns a dict and immediately calls .get. If query_context is valid JSON but not an object (e.g. a list or string), calling .get will raise an AttributeError which is then swallowed by the broad except, skipping the row silently. Adding an explicit isinstance(query_context, dict) check is a small, correct defensive fix that prevents that runtime type error and makes the migration behavior explicit. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/migrations/versions/2025-12-16_12-00_f5b5f88d8526_fix_form_data_string_in_query_context.py **Line:** 84:84 **Comment:** *Type Error: The code assumes that the decoded `query_context` is always a dict and immediately calls `.get`, but if the column contains valid non-object JSON (e.g. a string or list), `json.loads` will succeed and `.get` will raise an AttributeError, causing the migration to skip that slice via the broad `except` and leave it in a broken state; adding an explicit type check for a dict before accessing `.get` avoids this runtime type error and ensures only well-formed contexts are processed. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/migrations/versions/2025-12-16_12-00_f5b5f88d8526_fix_form_data_string_in_query_context.py: ########## @@ -0,0 +1,108 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""fix_form_data_string_in_query_context + +Revision ID: f5b5f88d8526 +Revises: a9c01ec10479 +Create Date: 2025-12-16 12:00:00.000000 + +""" + +import logging + +from alembic import op +from sqlalchemy import Column, Integer, String, Text +from sqlalchemy.ext.declarative import declarative_base + +from superset import db +from superset.migrations.shared.utils import paginated_update +from superset.utils import json + +# revision identifiers, used by Alembic. +revision = "f5b5f88d8526" +down_revision = "a9c01ec10479" + +Base = declarative_base() +logger = logging.getLogger(__name__) + +# Viz types that have migrations that were going through the bug +MIGRATED_VIZ_TYPES = [ + "treemap_v2", + "pivot_table_v2", + "mixed_timeseries", + "sunburst_v2", + "echarts_timeseries_line", + "echarts_timeseries_smooth", + "echarts_timeseries_step", + "echarts_area", + "echarts_timeseries_bar", + "bubble_v2", + "heatmap_v2", + "histogram_v2", + "sankey_v2", +] + + +class Slice(Base): + __tablename__ = "slices" + + id = Column(Integer, primary_key=True) + viz_type = Column(String(250)) + query_context = Column(Text) + + +def upgrade(): + """ + Fix charts where form_data in query_context was stored as a JSON string + instead of a dict during chart import migration. + """ + bind = op.get_bind() + session = db.Session(bind=bind) + + for slc in paginated_update( + session.query(Slice).filter( + Slice.viz_type.in_(MIGRATED_VIZ_TYPES), + Slice.query_context.isnot(None), + ) + ): + try: + query_context = json.loads(slc.query_context) + form_data = query_context.get("form_data") + + # Check if form_data is a non-empty string (the bug) + if form_data and isinstance(form_data, str): + try: + query_context["form_data"] = json.loads(form_data) Review Comment: **Suggestion:** After parsing `form_data` from a JSON string, the code directly assigns the result back into `query_context["form_data"]` without verifying it is a dict, so if the stored string represents a non-object JSON value (e.g. a list or scalar), later consumers expecting a dict will see an invalid type and may fail when constructing a `QueryContext`; adding a type check to only persist dict-like parsed values prevents this latent runtime type error. [type error] **Severity Level:** Minor ⚠️ ```suggestion parsed_form_data = json.loads(form_data) if not isinstance(parsed_form_data, dict): logger.warning( "Unexpected form_data type for slice %s, skipping", slc.id ) continue query_context["form_data"] = parsed_form_data ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The suggestion correctly prevents persisting parsed form_data that isn't a mapping (dict). If the stored string decodes to a list/scalar, downstream code expecting a dict will later fail; skipping such cases (or at least not writing invalid types back) is safer. This is a real correctness improvement for data integrity in the migration. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/migrations/versions/2025-12-16_12-00_f5b5f88d8526_fix_form_data_string_in_query_context.py **Line:** 89:89 **Comment:** *Type Error: After parsing `form_data` from a JSON string, the code directly assigns the result back into `query_context["form_data"]` without verifying it is a dict, so if the stored string represents a non-object JSON value (e.g. a list or scalar), later consumers expecting a dict will see an invalid type and may fail when constructing a `QueryContext`; adding a type check to only persist dict-like parsed values prevents this latent runtime type error. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
