uranusjr commented on PR #41424: URL: https://github.com/apache/airflow/pull/41424#issuecomment-2314509500
Status: * Added `name` to Dataset. Model changes and migration added. * Both `name` and `uri` are optional, but the user must supply at least one of them. There’s a check when a Dataset is created. * The database enforces _both `name` and `uri` must be unique_. This means you can’t have two assets (different names) point to the same URI. It makes dataset resolution logic a lot simpler. Probably good enough in most cases? * The DAG processor (during `bulk_write_to_db`) collects all datasets and de-duplicate name and URI values. Currently this is done trivially (just randomly drop one of them). We might want to emit warnings, especially since the previous constraint can be confusing for users in edge cases. Todo: * Fix existing tests. Many tests currently do something like this `Dataset(uri)`. Should we fix all of them, or should we just make the positional argument the URI instead? Currently it’s the name. * DatasetAlias’s `add` interface (in `OutletEventAccessor`) currently only takes the URI. It should also accept the name and intelligently select the correct Dataset from the string value. Matching name over URI. * Add tests for various name-URI combinations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
