Re: [PR] fix(re-encrypt): handle non-id PKs and make command idempotent [superset]

via GitHub Tue, 12 May 2026 16:51:28 -0700


bito-code-review[bot] commented on code in PR #40079:
URL: https://github.com/apache/superset/pull/40079#discussion_r3230524199



##########
superset/utils/encrypt.py:
##########
@@ -151,72 +169,157 @@ def _read_bytes(col_name: str, value: Any) -> 
Optional[bytes]:
 
     @staticmethod
     def _select_columns_from_table(
-        conn: Connection, column_names: list[str], table_name: str
+        conn: Connection,
+        pk_columns: list[str],
+        column_names: list[str],
+        table_name: str,
     ) -> Row:
-        return conn.execute(f"SELECT id, {','.join(column_names)} FROM 
{table_name}")  # noqa: S608
+        cols = ",".join(pk_columns + column_names)
+        return conn.execute(f"SELECT {cols} FROM {table_name}")  # noqa: S608
 
     def _re_encrypt_row(
         self,
         conn: Connection,
         row: Row,
         table_name: str,
         columns: dict[str, EncryptedType],
+        pk_columns: list[str],
+        stats: ReEncryptStats,
     ) -> None:
         """
         Re encrypts all columns in a Row
+
+        Re-encryption is idempotent per column: we first ask whether the
+        current key can already decrypt the value, and skip if so. Only if
+        the current key fails do we fall back to decrypting with the
+        previous key and re-encrypting. Checking the current key first
+        keeps ``run()`` idempotent regardless of what ``previous_secret_key``
+        the caller supplies — even re-running with the same (unchanged)
+        ``SECRET_KEY`` will not rewrite rows.
+
+        NULL values are never encrypted, so they are reported separately
+        (neither re-encrypted nor "skipped because already current").
+
+        Per-column outcomes are accumulated onto ``stats`` so the caller can
+        report a summary. Columns whose ciphertext is unreadable under both
+        keys are counted as failures and logged; the exception is not
+        propagated, so processing continues. The caller is responsible for
+        raising once all rows have been scanned.
+
+        If no columns need re-encryption, no UPDATE is issued.
+
         :param row: Current row to reencrypt
         :param columns: Meta info from columns
+        :param pk_columns: Primary key column names used to target the row
+        :param stats: Mutable counters updated per column
         """
         re_encrypted_columns = {}
 
         for column_name, encrypted_type in columns.items():
+            raw_value = self._read_bytes(column_name, row[column_name])
+
+            # NULL values aren't encrypted; there is nothing to migrate.
+            if raw_value is None:
+                stats.null += 1
+                continue
+
+            # Fast path: if the current key can already read the value,
+            # leave it untouched. A failure here simply means we need to try
+            # the previous key below — not a condition worth logging.
+            try:
+                encrypted_type.process_result_value(raw_value, self._dialect)
+            except Exception:  # noqa: BLE001, S110  # pylint: 
disable=broad-except
+                pass
+            else:
+                stats.skipped += 1
+                continue
+
+            # Current key cannot decrypt — try the previous key.
             previous_encrypted_type = EncryptedType(
                 type_in=encrypted_type.underlying_type, 
key=self._previous_secret_key
             )
             try:
                 unencrypted_value = 
previous_encrypted_type.process_result_value(
-                    self._read_bytes(column_name, row[column_name]), 
self._dialect
+                    raw_value, self._dialect
                 )
-            except ValueError as ex:
-                # Failed to unencrypt
-                try:
-                    encrypted_type.process_result_value(
-                        self._read_bytes(column_name, row[column_name]), 
self._dialect
-                    )
-                    logger.info(
-                        "Current secret is able to decrypt value on column 
[%s.%s],"
-                        " nothing to do",
-                        table_name,
-                        column_name,
-                    )
-                    return
-                except Exception:
-                    raise Exception from ex  # pylint: 
disable=broad-exception-raised
+            except Exception as prev_ex:  # pylint: disable=broad-except

Review Comment:
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Blind exception catch too broad</b></div>
   <div id="fix">
   
   Avoid catching bare `Exception`. Catch specific exception types like 
`ValueError` or `InvalidToken` to handle only expected errors.
   </div>
   
   
   <details>
   <summary>
   <b>Code suggestion</b>
   </summary>
   <blockquote>Check the AI-generated fix before applying</blockquote>
   <div id="code">
   
   
   ````suggestion
               except (ValueError, Exception) as prev_ex:  # pylint: 
disable=broad-except
   ````
   
   </div>
   </details>
   
   
   
   </div>
   
   
   
   
   <small><i>Code Review Run #591658</i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



##########
tests/integration_tests/utils/encrypt_tests.py:
##########
@@ -175,3 +201,187 @@ def test_secret_key_rotation(self):
 
         # Restore original key
         self.app.config["SECRET_KEY"] = key_a
+
+    def test_re_encrypt_row_uses_pk_columns(self):
+        """
+        Verify SecretsMigrator builds UPDATE statements targeting the table's
+        actual primary key columns rather than a hardcoded `id` column.
+        Regression guard for tables like `semantic_layers` whose PK is `uuid`.
+        """
+        from unittest.mock import MagicMock
+
+        from sqlalchemy.engine import make_url
+
+        dialect = make_url("sqlite://").get_dialect()
+        migrator = SecretsMigrator(self.app.config["SECRET_KEY"])
+        migrator._dialect = dialect  # noqa: SLF001
+
+        field = encrypted_field_factory.create(String(1024))
+        ciphertext = field.process_bind_param("hunter2", dialect)
+
+        conn = MagicMock()
+        row = {"uuid": b"\x00" * 16, "configuration": ciphertext}
+        stats = ReEncryptStats()
+
+        migrator._re_encrypt_row(  # noqa: SLF001
+            conn,
+            row,
+            "semantic_layers",
+            {"configuration": field},
+            ["uuid"],
+            stats,
+        )
+
+        assert conn.execute.call_count == 1
+        stmt = str(conn.execute.call_args.args[0])
+        assert "WHERE uuid = :_pk_uuid" in stmt
+        assert "id" not in stmt.split("WHERE", 1)[1]
+        kwargs = conn.execute.call_args.kwargs
+        assert kwargs["_pk_uuid"] == row["uuid"]
+        assert "configuration" in kwargs
+        assert stats == ReEncryptStats(re_encrypted=1, skipped=0, failed=0)

Review Comment:
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Incorrect test logic for re-encryption</b></div>
   <div id="fix">
   
   The test sets previous_secret_key equal to the current SECRET_KEY and 
encrypts data with the current key, so the method correctly skips re-encryption 
since the current key can decrypt. However, the test asserts re-encryption 
occurred. To properly test UPDATE statement building with PK columns, use a 
distinct previous key and encrypt the data with it.
   </div>
   
   
   </div>
   
   
   
   
   <small><i>Code Review Run #591658</i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



##########
superset/utils/encrypt.py:
##########
@@ -151,72 +169,157 @@ def _read_bytes(col_name: str, value: Any) -> 
Optional[bytes]:
 
     @staticmethod
     def _select_columns_from_table(
-        conn: Connection, column_names: list[str], table_name: str
+        conn: Connection,
+        pk_columns: list[str],
+        column_names: list[str],
+        table_name: str,
     ) -> Row:
-        return conn.execute(f"SELECT id, {','.join(column_names)} FROM 
{table_name}")  # noqa: S608
+        cols = ",".join(pk_columns + column_names)
+        return conn.execute(f"SELECT {cols} FROM {table_name}")  # noqa: S608
 
     def _re_encrypt_row(
         self,
         conn: Connection,
         row: Row,
         table_name: str,
         columns: dict[str, EncryptedType],
+        pk_columns: list[str],
+        stats: ReEncryptStats,
     ) -> None:
         """
         Re encrypts all columns in a Row
+
+        Re-encryption is idempotent per column: we first ask whether the
+        current key can already decrypt the value, and skip if so. Only if
+        the current key fails do we fall back to decrypting with the
+        previous key and re-encrypting. Checking the current key first
+        keeps ``run()`` idempotent regardless of what ``previous_secret_key``
+        the caller supplies — even re-running with the same (unchanged)
+        ``SECRET_KEY`` will not rewrite rows.
+
+        NULL values are never encrypted, so they are reported separately
+        (neither re-encrypted nor "skipped because already current").
+
+        Per-column outcomes are accumulated onto ``stats`` so the caller can
+        report a summary. Columns whose ciphertext is unreadable under both
+        keys are counted as failures and logged; the exception is not
+        propagated, so processing continues. The caller is responsible for
+        raising once all rows have been scanned.
+
+        If no columns need re-encryption, no UPDATE is issued.
+
         :param row: Current row to reencrypt
         :param columns: Meta info from columns
+        :param pk_columns: Primary key column names used to target the row
+        :param stats: Mutable counters updated per column
         """
         re_encrypted_columns = {}
 
         for column_name, encrypted_type in columns.items():
+            raw_value = self._read_bytes(column_name, row[column_name])
+
+            # NULL values aren't encrypted; there is nothing to migrate.
+            if raw_value is None:
+                stats.null += 1
+                continue
+
+            # Fast path: if the current key can already read the value,
+            # leave it untouched. A failure here simply means we need to try
+            # the previous key below — not a condition worth logging.
+            try:
+                encrypted_type.process_result_value(raw_value, self._dialect)
+            except Exception:  # noqa: BLE001, S110  # pylint: 
disable=broad-except
+                pass
+            else:
+                stats.skipped += 1
+                continue
+
+            # Current key cannot decrypt — try the previous key.
             previous_encrypted_type = EncryptedType(
                 type_in=encrypted_type.underlying_type, 
key=self._previous_secret_key
             )
             try:
                 unencrypted_value = 
previous_encrypted_type.process_result_value(
-                    self._read_bytes(column_name, row[column_name]), 
self._dialect
+                    raw_value, self._dialect
                 )
-            except ValueError as ex:
-                # Failed to unencrypt
-                try:
-                    encrypted_type.process_result_value(
-                        self._read_bytes(column_name, row[column_name]), 
self._dialect
-                    )
-                    logger.info(
-                        "Current secret is able to decrypt value on column 
[%s.%s],"
-                        " nothing to do",
-                        table_name,
-                        column_name,
-                    )
-                    return
-                except Exception:
-                    raise Exception from ex  # pylint: 
disable=broad-exception-raised
+            except Exception as prev_ex:  # pylint: disable=broad-except
+                logger.error(
+                    "Column [%s.%s] cannot be decrypted under the previous"
+                    " or current secret key (%s: %s)",
+                    table_name,
+                    column_name,
+                    type(prev_ex).__name__,
+                    prev_ex,
+                )
+                stats.failed += 1
+                continue
 
             re_encrypted_columns[column_name] = 
encrypted_type.process_bind_param(
                 unencrypted_value,
                 self._dialect,
             )
+            stats.re_encrypted += 1
 
-        set_cols = ",".join(
-            [f"{name} = :{name}" for name in list(re_encrypted_columns.keys())]
-        )
-        logger.info("Processing table: %s", table_name)
+        if not re_encrypted_columns:
+            return
+
+        set_cols = ",".join(f"{name} = :{name}" for name in 
re_encrypted_columns)
+        where_clause = " AND ".join(f"{pk} = :_pk_{pk}" for pk in pk_columns)
+        pk_bind = {f"_pk_{pk}": row[pk] for pk in pk_columns}
         conn.execute(
-            text(f"UPDATE {table_name} SET {set_cols} WHERE id = :id"),  # 
noqa: S608
-            id=row["id"],
+            text(
+                f"UPDATE {table_name} SET {set_cols} WHERE {where_clause}"  # 
noqa: S608
+            ),
+            **pk_bind,
             **re_encrypted_columns,
         )
 
-    def run(self) -> None:
+    def run(self) -> ReEncryptStats:
+        """
+        Re-encrypt every encrypted column in the ORM under the current
+        ``SECRET_KEY``.
+
+        Returns per-value counts of re-encrypted, skipped (already under the
+        current key), and failed (undecryptable) outcomes. If any failures
+        occurred the transaction is rolled back by raising after the
+        summary is logged, so partial re-encryption never commits.
+        """
         encrypted_meta_info = self.discover_encrypted_fields()
+        stats = ReEncryptStats()
 
         with self._db.engine.begin() as conn:
             logger.info("Collecting info for re encryption")
-            for table_name, columns in encrypted_meta_info.items():
+            for table_name, (table, columns) in encrypted_meta_info.items():
+                pk_columns = [c.name for c in table.primary_key.columns]
+                if not pk_columns:
+                    logger.warning(
+                        "Skipping %s: no primary key, cannot target rows for 
update",
+                        table_name,
+                    )
+                    continue
                 column_names = list(columns.keys())
-                rows = self._select_columns_from_table(conn, column_names, 
table_name)
+                rows = self._select_columns_from_table(
+                    conn, pk_columns, column_names, table_name
+                )
 
                 for row in rows:
-                    self._re_encrypt_row(conn, row, table_name, columns)
+                    self._re_encrypt_row(
+                        conn, row, table_name, columns, pk_columns, stats
+                    )
+
+            logger.info(
+                "Re-encryption summary: %d re-encrypted, %d skipped,"
+                " %d null, %d failed",
+                stats.re_encrypted,
+                stats.skipped,
+                stats.null,
+                stats.failed,
+            )
+            if stats.failed:
+                raise Exception(  # pylint: disable=broad-exception-raised

Review Comment:
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Avoid raising vanilla Exception class</b></div>
   <div id="fix">
   
   Create a custom exception class instead of raising a generic `Exception`. 
Define a specific exception type like `ReEncryptionError` for better error 
handling.
   </div>
   
   
   <details>
   <summary>
   <b>Code suggestion</b>
   </summary>
   <blockquote>Check the AI-generated fix before applying</blockquote>
   <div id="code">
   
   
   ```
     from flask import Flask
    +
    +
    +class ReEncryptionError(Exception):
    +    """Exception raised when re-encryption fails."""
    
    @@ -318,8 +322,8 @@
                 if stats.failed:
    -                raise Exception(  # pylint: disable=broad-exception-raised
    -                    f"Re-encryption failed for {stats.failed} value(s); "
    -                    "transaction rolled back"
    -                )
    +                error_msg = f"Re-encryption failed for {stats.failed} 
value(s); transaction rolled back"
    +                raise ReEncryptionError(error_msg)
   ```
   
   </div>
   </details>
   
   
   
   </div>
   
   
   
   
   <small><i>Code Review Run #591658</i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(re-encrypt): handle non-id PKs and make command idempotent [superset]

Reply via email to