ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1324041939

   > Someone of end users might be unhappy the fact that we actually started 
collect additional telemetry like.
   dag_id and ClassName even if it only hashes. So might be better collect by 
default only Airflow and Amazon Provider version and if required additional 
metadata then call optional callable which could controlled by config e.g. 
[aws] extra_botocore_user_agent_callable with full qualified path to callable 
e.g. some_vendor.safe.get_user_agent
   
   Yeah, it is possible that this may raise an eyebrow and that is something I 
put a lot of thought into.  In the end I decided that Databricks is already 
adding the calling method to their user agent field [see link in the PR 
description] so I felt it was already approved by the community.  And as you 
mentioned, the Dag ID is a hashed value that never contained any kind of 
secret/personal/private information even if it could somehow be "decoded".  I 
am definitely open to hear what others think, but I think we're safe here 
(obviously, or I wouldn't have submitted it)
   
   In the long term I was thinking it would be neat to try to consolidate these 
various user agent collection/building methods that various different providers 
are doing and make some kind of contribution to core that vends these values.  
That would make the "let the user opt in to what they are willing to share" 
concept much easier.  They could easily see what they are sharing universally 
across providers in one place, and providers who wish to add these values to 
their agent don't all have to rewrite the book.   I basically copy/pasted the 
"get package version" method from Yandex [link in PR description] and if 
another provider wants to include it in their user agent, that's yet another 
bowl of copypasta.  But a central provider/builder is outside the scope of this 
PR.  At least three other providers are doing this their own way already, I'd 
like to get AWS added to that before we start trying to sort out how to 
standardize everything across providers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to