dstandish commented on issue #25186: URL: https://github.com/apache/airflow/issues/25186#issuecomment-1190527854
yeah you are probably right. i'm not saying that we necessarily create a new standard. what's at stake for me is whether airflow takes on the responsibility of normalizing e.g. when storing in the database. in other words, given that the URI field is the unique identifier for an airflow dataset, does airflow really need to take on the responsibility of normalizing, such that when a user's code handles it in two different ways, we treat it as the same dataset. i'm reluctant to take on that responsibility, and i am more inclined to force the user to just make their code consistent. in other words, just store in a fully case-sensitivie collation, and not trouble ourselves with, e.g. decomposing, lowering hostname, recomposing and storing -- or, alternatively, storing hostname separately in a case-insensitive field and merge back in on read etc. i would rather not. but i think we probably need to decide before 2.4 because a change to this behavior would be breaking. unless we want to mark datasets as experimental... which i doubt... i digress -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
