HADOOP-3733<https://issues.apache.org/jira/browse/HADOOP-3733> stripped out the 
user:password secret from the s3., s3a, s3n URLs for security grounds: 
everything logged Path entries without ever considering that they contained 
secret credentials.

but that turns out to break things, as noted in HADOOP-14439  ...you can't any 
more go Path -> String -> Path without authentication details being lost, and 
of course, guess how paths are often marshalled around? As strings (after all, 
they weren't serializable until recently)

Vinayakumar has proposed a patch reinstating retaining the secrets, at least 
enough for distcp

https://issues.apache.org/jira/browse/HADOOP-3733?focusedCommentId=16110297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16110297

I think I'm going to go with this, once I get the tests & testing to go with, 
and if its enough to work with spark too .. targeting 2.8.2 if its not too late.

If there's a risk, it's that if someone puts secrets into s3 URIs, the secrets 
are more likely to be logged. But even with the current code, there's no way to 
guarantee that the secrets will never be logged. The danger comes from having 
id:secret credentials in the URI —something people will be told off for doing.


Reply via email to