[ 
https://issues.apache.org/jira/browse/KNOX-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046626#comment-16046626
 ] 

Kevin Risden commented on KNOX-949:
-----------------------------------

I do agree that %2f in the url is definitely not a great idea. However, 
KNOX-690 definitely broke the capability for us. WebHBase directly takes %2f 
with no issues. Here is the test case that I used when doing a git bisect to 
find the KNOX-690 commit.

https://gist.github.com/risdenk/afecc66d6fc0c9d665abd1ae5466f341

The template parsing is where I got stuck trying to understand how the URL is 
rewritten. There is an original variable that holds the correct URL almost all 
the way through parsing and then gets rewritten. I'll have to pull out my IDE 
to see if I still have the breakpoints. It is almost like the original url is 
lost somehow in the parsing like a new Template object is created. This seemed 
to me where the variable was being lost.

Double encoding %2f is a very interesting idea. I wonder if that exploits Java 
new URL where it decodes the first but not the second? 

> WeBHDFS proxy replaces %20 encoded spaces in URL with + encoding
> ----------------------------------------------------------------
>
>                 Key: KNOX-949
>                 URL: https://issues.apache.org/jira/browse/KNOX-949
>             Project: Apache Knox
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Alex Willmer
>            Assignee: Larry McCay
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: KNOX-949-001.patch
>
>
> If a file with spaces in the name (e.g. {{foo bar.txt}}) is requested from 
> HDFS, through WebHDFS and Knox - then Knox rewrites the {{%20}} encoding in 
> the URL sent by the client, with {{+}} encoding (e.g. {{foo%20bar.txt}} -> 
> {{foo+bar.txt}}). This results in an HTTP 404 being returned by WebHDFS, and 
> hence by Knox. Requesting the same file directly from WebHDFS works. Example
> Client request
> {noformat}
> curl 
> "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
>  \
>      -<username>:<password> -k -s
> {noformat}
> Knox response body
> {noformat}
> {"exception":"FileNotFoundException",
>  "javaClassName":"java.io.FileNotFoundException",
>  "message":"File /docs/filename+with+spaces.pdf not found."}
> {noformat}
> Knox logs
> {noformat}
> ==> /var/log/hadoop/knox/gateway-audit.log <==
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|unavailable|Request method: GET
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|Groups: []
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request
> method: GET
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
> status: 404
> 17/05/24 15:51:05 
> ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|Response status: 404
> ==> /var/log/hadoop/knox/gateway.log <==
> 2017-05-24 15:51:05,254 INFO  hadoop.gateway 
> (KnoxLdapRealm.java:getUserDn(691)) - Computed
> userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
> principal: <username>
> 2017-05-24 15:51:05,259 INFO  hadoop.gateway 
> (AclsAuthorizationFilter.java:doFilter(85)) -
> Access Granted: true
> {noformat}
> Direct WebHDFS request for the same file
> {noformat}
> # curl -si -u: 
> "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
> --negotiate -L | head -n40
> HTTP/1.1 401 Authentication required
> Cache-Control: must-revalidate,no-cache,no-store
> Date: Wed, 24 May 2017 19:01:41 GMT
> Pragma: no-cache
> Date: Wed, 24 May 2017 19:01:41 GMT
> Pragma: no-cache
> X-FRAME-OPTIONS: SAMEORIGIN
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
> Content-Type: text/html; charset=iso-8859-1
> Content-Length: 1533
> Server: Jetty(6.1.26.hwx)
> HTTP/1.1 307 TEMPORARY_REDIRECT
> Cache-Control: no-cache
> Expires: Wed, 24 May 2017 19:01:42 GMT
> Date: Wed, 24 May 2017 19:01:42 GMT
> Pragma: no-cache
> Expires: Wed, 24 May 2017 19:01:42 GMT
> Date: Wed, 24 May 2017 19:01:42 GMT
> Pragma: no-cache
> X-FRAME-OPTIONS: SAMEORIGIN
> WWW-Authenticate: Negotiate 
> YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
> Set-Cookie: 
> hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
> Path=/; HttpOnly
> Content-Type: application/octet-stream
> Location: 
> http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
> Content-Length: 0
> Server: Jetty(6.1.26.hwx)
> HTTP/1.1 200 OK
> Access-Control-Allow-Methods: GET
> Access-Control-Allow-Origin: *
> Content-Type: application/octet-stream
> Connection: close
> Content-Length: 13365618
> %����1.6
> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
> ...
> {noformat}
> See also
>  - 
> http://mail-archives.apache.org/mod_mbox/knox-user/201705.mbox/%3C335C4DD06CF6C24EAA7A73F44D43D7CB4E6EB300%40SE-EX021.groupinfra.com%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to