uranusjr commented on PR #41424:
URL: https://github.com/apache/airflow/pull/41424#issuecomment-2314509500

   Status:
   
   * Added `name` to Dataset. Model changes and migration added.
   * Both `name` and `uri` are optional, but the user must supply at least one 
of them. There’s a check when a Dataset is created.
   * The database enforces _both `name` and `uri` must be unique_. This means 
you can’t have two assets (different names) point to the same URI. It makes 
dataset resolution logic a lot simpler. Probably good enough in most cases?
   * The DAG processor (during `bulk_write_to_db`) collects all datasets and 
de-duplicate name and URI values. Currently this is done trivially (just 
randomly drop one of them). We might want to emit warnings, especially since 
the previous constraint can be confusing for users in edge cases.
   
   Todo:
   
   * Fix existing tests. Many tests currently do something like this 
`Dataset(uri)`. Should we fix all of them, or should we just make the 
positional argument the URI instead? Currently it’s the name.
   * DatasetAlias’s `add` interface (in `OutletEventAccessor`) currently only 
takes the URI. It should also accept the name and intelligently select the 
correct Dataset from the string value. Matching name over URI.
   * Add tests for various name-URI combinations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to