anishgirianish commented on issue #44145:
URL: https://github.com/apache/airflow/issues/44145#issuecomment-3901476886
Hi @uranusjr I'd like to pick this up if it's available. I've been
contributing to Airflow and have been following the AIP-76 work closely.
I've gone through the existing merged PRs (#57360, #58289, #59115, #60934)
and the AIP spec to understand the current partition infrastructure. Here's my
read of the implementation path , happy to be corrected:
PartitionBySequence
- New PartitionMapper subclass accepting an explicit ordered list of string
keys (e.g.,["us", "eu", "asia"])
- Defined in airflow-core/src/airflow/partition_mapper/ with a mirrored
definition in task sdk/src/airflow/sdk/definitions/partition_mapper/
- to_downstream() would validate the key exists in the sequence and pass it
through
- serialize()/deserialize() to persist the sequence list
PartitionByProduct
- Combines up to one time-based dimension with multiple segment-based
dimensions as the AIP describes
- Composes existing partition types and generates Cartesian product keys
- Key format would need a convention for compound keys (e.g., tuple
serialization)
Shared work
- Register both in BUILTIN_PARTITION_MAPPERS in encoders.py
- Export from task-sdk/src/airflow/sdk/__init__.py
- Tests and documentation
A couple of questions before I start:
1. For PartitionBySequence should to_downstream() enforce that incoming keys
are members
of the defined sequence, or should it be more permissive?
2. For PartitionByProduct compound keys is there a preferred serialization
format for multi-dimensional keys, or is that still an open design question?
3. I see @Lee-W's work on per-asset partition mapping (#60966) I believe
this is independent since I'm adding new mapper types rather than changing how
mappers are assigned. Just want to confirm there's no coordination needed.
I'll open a draft PR early so you can steer the direction before I go too
deep. Looking forward to contributing to AIP-76.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]