Vishnu Vardhan created HADOOP-14142: ---------------------------------------
Summary: S3A - Adding unexpected prefix Key: HADOOP-14142 URL: https://issues.apache.org/jira/browse/HADOOP-14142 Project: Hadoop Common Issue Type: Bug Reporter: Vishnu Vardhan Priority: Critical Hi: S3A seems to prefix unexpected prefix to my s3 path Specifically, in the debug log below the following line is unexpected > GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 It is not clear where the "prefix" is coming from and why. I executed the following commands sc.setLogLevel("DEBUG") sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem") sc.hadoopConfiguration.set("fs.s3a.endpoint","webscaledemo.netapp.com:8082") sc.hadoopConfiguration.set("fs.s3a.access.key","") sc.hadoopConfiguration.set("fs.s3a.secret.key","") sc.hadoopConfiguration.set("fs.s3a.path.style.access","false") val s3Rdd = sc.textFile("s3a://myBkt98") s3Rdd.count() ---- debug log is below application/x-www-form-urlencoded; charset=utf-8 Thu, 02 Mar 2017 22:40:25 GMT /myBkt8/" 17/03/02 14:40:25 DEBUG request: Sending Request: GET https://webscaledemo.netapp.com:8082 /myBkt8/ Parameters: (max-keys: 1, prefix: user/vardhan/, delimiter: /, ) Headers: (Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=, User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60, Date: Thu, 02 Mar 2017 22:40:25 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection request: [route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 0 of 15; total allocated: 0 of 15] 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection leased: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 1 of 15; total allocated: 1 of 15] 17/03/02 14:40:25 DEBUG DefaultClientConnectionOperator: Connecting to webscaledemo.netapp.com:8082 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS 17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS 17/03/02 14:40:26 DEBUG RequestAddCookies: CookieSpec selected: default 17/03/02 14:40:26 DEBUG RequestAuthCache: Auth cache not set in the context 17/03/02 14:40:26 DEBUG RequestProxyAuthentication: Proxy auth state: UNCHALLENGED 17/03/02 14:40:26 DEBUG SdkHttpClient: Attempt 1 to execute request 17/03/02 14:40:26 DEBUG DefaultClientConnection: Sending request: GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 17/03/02 14:40:26 DEBUG wire: >> "GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Host: webscaledemo.netapp.com:8082[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Date: Thu, 02 Mar 2017 22:40:25 GMT[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "Connection: Keep-Alive[\r][\n]" 17/03/02 14:40:26 DEBUG wire: >> "[\r][\n]" 17/03/02 14:40:26 DEBUG headers: >> GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1 17/03/02 14:40:26 DEBUG headers: >> Host: webscaledemo.netapp.com:8082 17/03/02 14:40:26 DEBUG headers: >> Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324= 17/03/02 14:40:26 DEBUG headers: >> User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60 17/03/02 14:40:26 DEBUG headers: >> Date: Thu, 02 Mar 2017 22:40:25 GMT 17/03/02 14:40:26 DEBUG headers: >> Content-Type: application/x-www-form-urlencoded; charset=utf-8 17/03/02 14:40:26 DEBUG headers: >> Connection: Keep-Alive 17/03/02 14:40:26 DEBUG wire: << "HTTP/1.1 200 OK[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Date: Thu, 02 Mar 2017 22:40:26 GMT[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Connection: KEEP-ALIVE[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Server: StorageGRID/10.3.0.1[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "x-amz-request-id: 563477649[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Content-Length: 266[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "Content-Type: application/xml[\r][\n]" 17/03/02 14:40:26 DEBUG wire: << "[\r][\n]" 17/03/02 14:40:26 DEBUG DefaultClientConnection: Receiving response: HTTP/1.1 200 OK 17/03/02 14:40:26 DEBUG headers: << HTTP/1.1 200 OK 17/03/02 14:40:26 DEBUG headers: << Date: Thu, 02 Mar 2017 22:40:26 GMT 17/03/02 14:40:26 DEBUG headers: << Connection: KEEP-ALIVE 17/03/02 14:40:26 DEBUG headers: << Server: StorageGRID/10.3.0.1 17/03/02 14:40:26 DEBUG headers: << x-amz-request-id: 563477649 17/03/02 14:40:26 DEBUG headers: << Content-Length: 266 17/03/02 14:40:26 DEBUG headers: << Content-Type: application/xml 17/03/02 14:40:26 DEBUG SdkHttpClient: Connection can be kept alive indefinitely 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Sanitizing XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 17/03/02 14:40:26 DEBUG wire: << "<?xml version="1.0" encoding="UTF-8"?>[\n]" 17/03/02 14:40:26 DEBUG wire: << "<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>myBkt8</Name><Prefix>user/vardhan/</Prefix><Marker></Marker><MaxKeys>1</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated></ListBucketResult>" 17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection [id: 10][route: {s}->https://webscaledemo.netapp.com:8082] can be kept alive indefinitely 17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection released: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 1; route allocated: 1 of 15; total allocated: 1 of 15] 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Parsing XML response document with handler: class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Examining listing for bucket: myBkt8 17/03/02 14:40:26 DEBUG request: Received successful response: 200, AWS Request ID: 563477649 17/03/02 14:40:26 DEBUG S3AFileSystem: Not Found: s3a://myBkt8/user/vardhan org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a://myBkt8 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD.count(RDD.scala:1157) ... 53 elided -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org