[ https://issues.apache.org/jira/browse/JCLOUDS-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacob Nguyen updated JCLOUDS-1638: ---------------------------------- Description: {noformat} java.lang.RuntimeException: request: GET https://sclas-cloud-storage-master.s3.amazonaws.com/?delimiter=/&prefix=Data/57-2943/10-8-20/&max-keys=1000 HTTP/1.1; response: HTTP/1.1 200 OK; cause: java.lang.RuntimeException: request: GET https://sclas-cloud-storage-master.s3.amazonaws.com/?delimiter=/&prefix=Data/57-2943/10-8-20/&max-keys=1000 HTTP/1.1; error at 323:2 in document ; cause: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 323; Character reference "" is an invalid XML character. at org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:174) at org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:146) at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:86) at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:52) at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:91) at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:74) at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:45) at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156) at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123) at jdk.proxy2/jdk.proxy2.$Proxy235.listBucket(Unknown Source) at org.jclouds.s3.blobstore.S3BlobStore.list(S3BlobStore.java:177) {noformat} When there's a control character in the folder path in S3, we can't parse it from the response because it throws SAXParseException. Can there be an option that at least lets us forward the encoding-type param? https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax And url decode it for us so that listing can be possible? This bug currently doesn't allow us to list any children of a root folder if one of the children contains control characters. Here's an example XML response from S3 when listing objects from cURL: {noformat} <?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test/</Prefix></CommonPrefixes></ListBucketResult> {noformat} Child folder of 'some' contains {noformat} <Prefix>some/test/</Prefix> {noformat} which can't be parsed. But with the urlParam &encoding-type=url : {noformat} <?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><EncodingType>url</EncodingType><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test%10/</Prefix></CommonPrefixes></ListBucketResult> {noformat} {noformat} <Prefix>some/test%10/</Prefix> {noformat} Can probably be parsed. was: {noformat} java.lang.RuntimeException: request: GET https://sclas-cloud-storage-master.s3.amazonaws.com/?delimiter=/&prefix=Data/57-2943/10-8-20/&max-keys=1000 HTTP/1.1; response: HTTP/1.1 200 OK; cause: java.lang.RuntimeException: request: GET https://sclas-cloud-storage-master.s3.amazonaws.com/?delimiter=/&prefix=Data/57-2943/10-8-20/&max-keys=1000 HTTP/1.1; error at 323:2 in document ; cause: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 323; Character reference "" is an invalid XML character. at org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:174) at org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:146) at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:86) at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:52) at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:91) at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:74) at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:45) at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156) at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123) at jdk.proxy2/jdk.proxy2.$Proxy235.listBucket(Unknown Source) at org.jclouds.s3.blobstore.S3BlobStore.list(S3BlobStore.java:177) {noformat} When there's a control character in the folder path in S3, we can't parse it from the response because it throws SAXParseException. Can there be an option that at least lets us forward the encoding-type param? https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax And url decode it for us so that listing can be possible? This bug currently doesn't allow us to list any children of a root folder if one of the children contains control characters. Here's an example XML response from S3 when listing objects from cURL: {noformat} <?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test/</Prefix></CommonPrefixes></ListBucketResult> {noformat} Child folder of 'some' contains {noformat} <Prefix>some/test/</Prefix> {noformat} which can't be parsed. But with the urlParam: {noformat} <?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><EncodingType>url</EncodingType><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test%10/</Prefix></CommonPrefixes></ListBucketResult> {noformat} {noformat} <Prefix>some/test%10/</Prefix> {noformat} Can probably be parsed. > SAXParseException on S3 Listing > ------------------------------- > > Key: JCLOUDS-1638 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1638 > Project: jclouds > Issue Type: Bug > Affects Versions: 2.5.0 > Reporter: Jacob Nguyen > Assignee: Andrew Gaul > Priority: Major > > {noformat} > java.lang.RuntimeException: request: GET > https://sclas-cloud-storage-master.s3.amazonaws.com/?delimiter=/&prefix=Data/57-2943/10-8-20/&max-keys=1000 > HTTP/1.1; response: HTTP/1.1 200 OK; cause: java.lang.RuntimeException: > request: GET > https://sclas-cloud-storage-master.s3.amazonaws.com/?delimiter=/&prefix=Data/57-2943/10-8-20/&max-keys=1000 > HTTP/1.1; error at 323:2 in document ; cause: org.xml.sax.SAXParseException; > lineNumber: 2; columnNumber: 323; Character reference "" is an invalid > XML character. > at > org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:174) > at > org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:146) > at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:86) > at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:52) > at > org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:91) > at > org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:74) > at > org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:45) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123) > at jdk.proxy2/jdk.proxy2.$Proxy235.listBucket(Unknown Source) > at org.jclouds.s3.blobstore.S3BlobStore.list(S3BlobStore.java:177) > {noformat} > When there's a control character in the folder path in S3, we can't parse it > from the response because it throws SAXParseException. > Can there be an option that at least lets us forward the encoding-type param? > https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax > And url decode it for us so that listing can be possible? This bug currently > doesn't allow us to list any children of a root folder if one of the children > contains control characters. > Here's an example XML response from S3 when listing objects from cURL: > {noformat} > <?xml version="1.0" encoding="UTF-8"?> > <ListBucketResult > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test/</Prefix></CommonPrefixes></ListBucketResult> > {noformat} > Child folder of 'some' contains > {noformat} > <Prefix>some/test/</Prefix> > {noformat} > which can't be parsed. > But with the urlParam &encoding-type=url : > {noformat} > <?xml version="1.0" encoding="UTF-8"?> > <ListBucketResult > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><EncodingType>url</EncodingType><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test%10/</Prefix></CommonPrefixes></ListBucketResult> > {noformat} > {noformat} > <Prefix>some/test%10/</Prefix> > {noformat} > Can probably be parsed. -- This message was sent by Atlassian Jira (v8.20.10#820010)