ferruzzi commented on code in PR #27823: URL: https://github.com/apache/airflow/pull/27823#discussion_r1029660736
########## airflow/providers/amazon/aws/hooks/base_aws.py: ########## @@ -25,9 +25,13 @@ from __future__ import annotations Review Comment: Moving to code comments so these can be threaded discussions: > Someone of end users might be unhappy the fact that we actually started collect additional telemetry like. dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent Yeah, it is possible that this may raise an eyebrow and that is something I put a lot of thought into. In the end I decided that Databricks is already adding the calling method to their user agent field [see link in the PR description] so I felt it was already approved by the community. And as you mentioned, the Dag ID is a hashed value that never contained any kind of secret/personal/private information even if it could somehow be "decoded". I am definitely open to hear what others think, but I think we're safe here (obviously, or I wouldn't have submitted it) In the long term I was thinking it would be neat to try to consolidate these various user agent collection/building methods that various different providers are doing and make some kind of contribution to core that vends these values. That would make the "let the user opt in to what they are willing to share" concept much easier. They could easily see what they are sharing universally across providers in one place, and providers who wish to add these values to their agent don't all have to rewrite the book. I basically copy/pasted the "get package version" method from Yandex [link in PR description] and if another provider wants to include it in their user agent, that's yet another bowl of copypasta. But a central provider/builder is outside the scope of this PR. At least three other providers are doing this their own way already, I'd like to get AWS added to that before we start trying to sort out how to standardize everything across providers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
