jgoedeke opened a new issue, #57564:
URL: https://github.com/apache/airflow/issues/57564

   ### Apache Airflow version
   
   3.1.1
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   When attaching event-specific metadata to assets in Airflow via the `extra` 
field (e.g., `Metadata(asset, {'timestamp': datetime.now(), ...})`), 
serializable objects like `datetime` and `Path` are converted to strings upon 
serialization, but are **not deserialized back to their original Python types** 
when accessed in downstream tasks. This leads to unexpected behavior and type 
mismatches for users expecting original types.
   
   ### Example
   
   ```python
   from datetime import UTC, datetime
   from pathlib import Path
   
   from airflow.sdk import Asset, Metadata, dag, task
   
   asset = Asset(
       's3://bucket/mydata.csv',
       extra={
           'path': Path('/some/local/path.txt'),
       },
   )
   
   
   @dag(
       schedule=None,
   )
   def producer_dag():
       @task(outlets=[asset])
       def produce_asset():
           yield Metadata(asset, {'timestamp': datetime.now(tz=UTC)})
   
       produce_asset()
   
   
   @dag(
       schedule=[asset],
   )
   def consumer_dag():
       @task(inlets=[asset])
       def consume_asset(inlet_events):
           events: list[Metadata] = inlet_events[asset]
           for event in events:
               timestamp = event.extra.get('timestamp')
               path = event.asset.extra.get('path')
               print(f'Asset: {event.asset}')
               print(f'Asset extra: {event.asset.extra}')
               print(f'Asset extra path type: {type(path)}, value: {path} ')
               print(f'Extra metadata: {event.extra}')
               print(f'Extra timestamp type: {type(timestamp)}, value: 
{timestamp} ')
   
       consume_asset()
   
   
   producer_dag()
   consumer_dag()
   
   ```
   
   #### Output
   ```
   [2025-10-30 14:36:21] INFO - Asset: name='s3://bucket/mydata.csv' 
uri='s3://bucket/mydata.csv' group='asset' extra={'path': 
'/some/local/path.txt'} source=task.stdout
   [2025-10-30 14:36:21] INFO - Asset extra: {'path': '/some/local/path.txt'} 
source=task.stdout
   [2025-10-30 14:36:21] INFO - Asset extra path type: <class 'str'>, value: 
/some/local/path.txt source=task.stdout
   [2025-10-30 14:36:21] INFO - Extra metadata: {'timestamp': 
'2025-10-30T14:35:16.937410Z'} source=task.stdout
   [2025-10-30 14:36:21] INFO - Extra timestamp type: <class 'str'>, value: 
2025-10-30T14:35:16.937410Z source=task.stdout
   ```
   
   - The `extra` dict values are always strings, regardless of their original 
type.
   - Downstream code must manually parse/convert these values, which is 
error-prone and inconsistent.
   
   
   ### What you think should happen instead?
   
   - When retrieving asset event metadata, deserializable objects (e.g., ISO 
datetime strings, Path strings) should be converted back to their corresponding 
Python types (e.g., `datetime`, `Path`).
   - This would restore consistency with how XCom values are deserialized.
   
   ### How to reproduce
   
   Use example DAG.
   
   ### Operating System
   
   docker
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to