potiuk commented on issue #25186:
URL: https://github.com/apache/airflow/issues/25186#issuecomment-1191135361

   I think we should at least detect "fat finger" problems. I.e when somoene 
*Inside airflow installation* creates two different datasets with equivalent 
urls, we should not allow that. We are able to do do that very easily and warn 
the user. 
   
   I am perfectly ok with storing dataset with URL without normalisation. But 
at least we should have an unique index which will prevent the user from 
creating two different datasets with two equivalent (but different) URIs. This 
is not difficult. We can for example fully normalize the URI, convert into 
base64-encoded string and save it as "unique_id" or smth in the database. Then 
whenever we are inserting a dataset with different URI and same "unique_id", we 
simply fail with:
   "This URI here is the same as that URI there".  Shoudl not be very complex, 
and I think it prevents users from making silly errors that will be difficult 
to debug otherwise.
   
   There is no real drawback of it that I can think except 'generate_unique_id" 
using normalisation. Pretty much no performance penaly, much better user 
experience. Airflow helping user to make less mistakes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to