[
https://issues.apache.org/jira/browse/KNOX-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630956#comment-15630956
]
Jeffrey E Rodriguez commented on KNOX-767:
-------------------------------------------
Thanks Again Sumit.
I will check your suggestion about bypassing the xmlfilterreaded if there is no
rule or filters to be applied.
So basically we would need not to use the XML Event reader for that case but
just a straight content reader without any XML parsing.
I think I can check that in a couple of days. That option would work for all
cases since we wont due XML parsing when no rules of filters apply.
In the other case when we have rules or filters we still we may still have to
figure out how to avoid the case of start/end tags into empty tag. For XML
syntax they are equivalent. start/end element tags with no content == null
element tag.
I think you don't manifested any issue with the CDATA.
> Knox transforms XML files written to WebHDFS
> --------------------------------------------
>
> Key: KNOX-767
> URL: https://issues.apache.org/jira/browse/KNOX-767
> Project: Apache Knox
> Issue Type: Bug
> Reporter: Sumit Gupta
> Assignee: Jeffrey E Rodriguez
> Fix For: 0.11.0
>
> Attachments: KNOX-767.patch
>
>
> When you write an XML file to WebHDFS through Knox with the Content-Type
> header set to text/xml or application/xml it is transformed by Knox so that
> empty tags like <xyz/> are written as <xyz></xyz> and CDATA is interpreted.
> This does not happen if written directly to WebHDFS. For example:
> {code}
> [root@hdp250 ~]# cat xxx
> <document>
> <empty/>
> <![CDATA[<xyz>wibble</xyz>]]>
> </document>
> [root@hdp250 ~]# curl -u guest:guest-password -i -k -X PUT
> "https://hdp250.local:8443/gateway/default/webhdfs/v1/tmp/xxx?op=CREATE&overwrite=true"
> HTTP/1.1 307 Temporary Redirect
> Date: Thu, 27 Oct 2016 19:25:35 GMT
> Set-Cookie:
> JSESSIONID=bt3timb9jl7k546fcntrj8s;Path=/gateway/default;Secure;HttpOnly
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0;
> Expires=Wed, 26-Oct-2016 19:25:35 GMT
> Cache-Control: no-cache
> Expires: Thu, 27 Oct 2016 19:25:35 GMT
> Date: Thu, 27 Oct 2016 19:25:35 GMT
> Pragma: no-cache
> Expires: Thu, 27 Oct 2016 19:25:35 GMT
> Date: Thu, 27 Oct 2016 19:25:35 GMT
> Pragma: no-cache
> Content-Type: application/octet-stream
> X-FRAME-OPTIONS: SAMEORIGIN
> Location:
> https://hdp250.local:8443/gateway/default/webhdfs/data/v1/webhdfs/v1/tmp/xxx?_=AAAACAAAABAAAAEQ-XO5sAM86ubmjRdUYXJEZkpM4Vdv3vmIprBetQwfaKaZNN4uc9O1IN8jujDD9GpPPCDJCKxebul_GlCFxDIZzbkhZ1tnhY5rZ6V12SVJgLo5DxMxC8zECeaM4M8OFLqHxamNnvduuUkD5y23RJczzHHJ9SyYuG6yiCpDJKB_5MffZIWFaEEcYM7jOkjStZHU_7cjIg_vRJL2nFCVTWKf1FPkB00QCbXHN-Ua6MfEG8p2aoQB70tfVHnmhhnBWx2PZARJ-kHp42rrpA1yrI86v3Q-OGI4Ya3pnPRWhPj0wbdDr_p_FDinsw2KRu1_aRSIXXznmJ--aX6TflbBGZvDImkw4x0QM48UGFpOChaLtHk73rlMMUbbbAwOew0gJ2-69PuXiL4QB48
> Server: Jetty(6.1.26.hwx)
> Content-Length: 0
> [root@hdp250 ~]# curl -u guest:guest-password -i -k -X PUT -T xxx -H
> 'Content-Type: text/xml'
> "https://hdp250.local:8443/gateway/default/webhdfs/data/v1/webhdfs/v1/tmp/xxx?_=AAAACAAAABAAAAEQ-XO5sAM86ubmjRdUYXJEZkpM4Vdv3vmIprBetQwfaKaZNN4uc9O1IN8jujDD9GpPPCDJCKxebul_GlCFxDIZzbkhZ1tnhY5rZ6V12SVJgLo5DxMxC8zECeaM4M8OFLqHxamNnvduuUkD5y23RJczzHHJ9SyYuG6yiCpDJKB_5MffZIWFaEEcYM7jOkjStZHU_7cjIg_vRJL2nFCVTWKf1FPkB00QCbXHN-Ua6MfEG8p2aoQB70tfVHnmhhnBWx2PZARJ-kHp42rrpA1yrI86v3Q-OGI4Ya3pnPRWhPj0wbdDr_p_FDinsw2KRu1_aRSIXXznmJ--aX6TflbBGZvDImkw4x0QM48UGFpOChaLtHk73rlMMUbbbAwOew0gJ2-69PuXiL4QB48"
> HTTP/1.1 100 Continue
> HTTP/1.1 201 Created
> Date: Thu, 27 Oct 2016 19:25:54 GMT
> Set-Cookie:
> JSESSIONID=3o27jby7c2a6mdpxducddqac;Path=/gateway/default;Secure;HttpOnly
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0;
> Expires=Wed, 26-Oct-2016 19:25:54 GMT
> Location: https://hdp250.local:8443/gateway/default/webhdfs/v1/tmp/xxx
> Connection: close
> Server: Jetty(9.2.15.v20160210)
> [root@hdp250 ~]# hdfs dfs -cat /tmp/xxx
> <?xml version="1.0" standalone="no"?><document>
> <empty></empty>
> <xyz>wibble</xyz>
> </document>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)