[jira] [Created] (SPARK-20061) Reading a file with colon (:) from S3 fails with URISyntaxException

Michel Lemay (JIRA) Wed, 22 Mar 2017 08:05:58 -0700

Michel Lemay created SPARK-20061:
------------------------------------

             Summary: Reading a file with colon (:) from S3 fails with 
URISyntaxException
                 Key: SPARK-20061
                 URL: https://issues.apache.org/jira/browse/SPARK-20061
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.1.0
         Environment: EC2, AWS
            Reporter: Michel Lemay



When reading a bunch of files from s3 using wildcards, it fails with the 
following exception:

{code}
scala> val fn = "s3a://mybucket/path/*/"
scala> val ds = spark.readStream.schema(schema).json(fn)

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
2017-01-06T20:33:45.255-analyticsqa-49569270507599054034141623773442922465540524816321216514.json
  at org.apache.hadoop.fs.Path.initialize(Path.java:205)
  at org.apache.hadoop.fs.Path.<init>(Path.java:171)
  at org.apache.hadoop.fs.Path.<init>(Path.java:93)
  at org.apache.hadoop.fs.Globber.glob(Globber.java:241)
  at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
  at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:237)
  at 
org.apache.spark.deploy.SparkHadoopUtil.globPathIfNecessary(SparkHadoopUtil.scala:243)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$2.apply(DataSource.scala:131)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$2.apply(DataSource.scala:127)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.execution.datasources.DataSource.tempFileIndex$lzycompute$1(DataSource.scala:127)
  at 
org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$tempFileIndex$1(DataSource.scala:124)
  at 
org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:138)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:229)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
  at 
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:133)
  at 
org.apache.spark.sql.streaming.DataStreamReader.json(DataStreamReader.scala:181)
  ... 50 elided
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
2017-01-06T20:33:45.255-analyticsqa-49569270507599054034141623773442922465540524816321216514.json
  at java.net.URI.checkPath(URI.java:1823)
  at java.net.URI.<init>(URI.java:745)
  at org.apache.hadoop.fs.Path.initialize(Path.java:202)
  ... 73 more
{code}

The file in question sits at the root of s3a://mybucket/path/

{code}
aws s3 ls s3://mybucket/path/

                           PRE subfolder1/
                           PRE subfolder2/
...
2017-01-06 20:33:46       1383 
2017-01-06T20:33:45.255-analyticsqa-49569270507599054034141623773442922465540524816321216514.json
...
{code}


Removing the wildcard from path make it work but it obviously does misses all 
files in subdirectories.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20061) Reading a file with colon (:) from S3 fails with URISyntaxException

Reply via email to