kaxil commented on code in PR #67902:
URL: https://github.com/apache/airflow/pull/67902#discussion_r3348112147


##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_store.py:
##########
@@ -35,15 +35,39 @@
 from fastapi import HTTPException, Query, status
 from sqlalchemy import select
 
-from airflow._shared.state import AssetScope
+from airflow._shared.state import AssetScope, AssetStoreWriterKind
 from airflow.api_fastapi.common.db.common import SessionDep
 from airflow.api_fastapi.execution_api.datamodels.asset_store import (
     AssetStorePutBody,
     AssetStoreResponse,
 )
-from airflow.api_fastapi.execution_api.security import ExecutionAPIRoute
+from airflow.api_fastapi.execution_api.datamodels.token import TIToken
+from airflow.api_fastapi.execution_api.security import CurrentTIToken, 
ExecutionAPIRoute
 from airflow.models.asset import AssetModel
+from airflow.models.taskinstance import TaskInstance
 from airflow.state import get_state_backend
+from airflow.state.metastore import MetastoreStoreBackend
+
+_TIWriterFields = tuple[str, str, str, int]
+
+
+def _fetch_ti_writer_fields(token: TIToken, session: SessionDep) -> 
_TIWriterFields:
+    """Return (dag_id, run_id, task_id, map_index) for the TI identified by 
the token."""
+    row = session.execute(
+        select(
+            TaskInstance.dag_id,
+            TaskInstance.run_id,
+            TaskInstance.task_id,
+            TaskInstance.map_index,
+        ).where(TaskInstance.id == token.id)
+    ).one_or_none()
+    if row is None:
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,

Review Comment:
   A missing TI here returns 500, but the sibling execution-API routes in 
`task_instances.py` return 404 `{"reason": "not_found", ...}` for the same 
stale-TI condition (e.g. the TI row was cleaned up between token mint and this 
PUT). A 500 reads as a server bug and can trigger client retries / 5xx alerts. 
Returning 404 to match the convention would be more accurate.
   



##########
airflow-core/src/airflow/models/asset_store.py:
##########
@@ -52,3 +63,9 @@ class AssetStoreModel(Base):
             ondelete="CASCADE",
         ),
     )
+
+    @property
+    def last_updated_by_kind_enum(self) -> AssetStoreWriterKind | None:

Review Comment:
   Is `last_updated_by_kind_enum` used anywhere? The routes build the response 
straight off the `last_updated_by_kind` string column, so this looks unused in 
the PR. If a UI/API consumer is coming, fine to keep; otherwise it's dead code.
   



##########
airflow-core/src/airflow/state/metastore.py:
##########
@@ -279,18 +279,68 @@ def _get_asset_store(self, scope: AssetScope, key: str, 
*, session: Session) ->
         )
         return row.value if row is not None else None
 
-    def _set_asset_store(self, scope: AssetScope, key: str, value: str, *, 
session: Session) -> None:
+    def _set_asset_store(
+        self,
+        scope: AssetScope,
+        key: str,
+        value: str,
+        *,
+        kind: AssetStoreWriterKind | None = None,
+        dag_id: str | None = None,
+        run_id: str | None = None,
+        task_id: str | None = None,
+        map_index: int | None = None,
+        session: Session,
+    ) -> None:
         now = timezone.utcnow()
-        values = dict(asset_id=scope.asset_id, key=key, value=value, 
updated_at=now)
+        kind_str = kind.value if kind is not None else None
+        writer_info = dict(
+            last_updated_by_kind=kind_str,
+            last_updated_by_dag_id=dag_id,
+            last_updated_by_run_id=run_id,
+            last_updated_by_task_id=task_id,
+            last_updated_by_map_index=map_index,
+        )
+        values = dict(asset_id=scope.asset_id, key=key, value=value, 
updated_at=now, **writer_info)
         stmt = _build_upsert_stmt(
             get_dialect_name(session),
             AssetStoreModel,
             ["asset_id", "key"],
             values,
-            dict(value=value, updated_at=now),
+            dict(value=value, updated_at=now, **writer_info),

Review Comment:
   The upsert always lists the `last_updated_by_*` columns in the update set, 
so a plain `set()` on an AssetScope (which reaches `_set_asset_store` with 
`kind=None`) overwrites previously recorded writer info back to NULL on 
conflict. No in-tree Metastore caller hits that path today since both routes go 
through `set_asset_store`, but the follow-up watcher write (#67839) and the 
worker-side `AssetStoreAccessor` also write asset state, and if either uses 
`set()` instead of `set_asset_store()`, attribution gets silently cleared with 
no error. Consider dropping the writer columns from the update set when `kind 
is None` so an attribution-less write leaves the existing values untouched.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to