[jira] [Updated] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

Reza Safi (JIRA) Tue, 24 Jan 2017 11:12:50 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reza Safi updated SPARK-19340:
------------------------------
    Description: 
If you want to open a file that its name is like  {noformat} "*{*}*.*" 
{noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
file is a local file or on hdfs.

This bug can be reproduced on master and all other Spark 2 branches.
To reproduce:
# Create a file like "test{00-1}.txt" on a local directory (like in 
/Users/reza/test/test{00-1}.txt)
# Run spark-shell
# Execute this command:
val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")

You will see the following stack trace:
{noformat}
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/Users/rezasafi/bck/sp2/test\{00-01\}.txt;
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
  at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
  at scala.Option.orElse(Option.scala:289)
  at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
  ... 48 elided
{noformat}



  was:
If you want to open a file that its name is like  {noformat} "*{*}*.*" 
{noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
file is a local file or on hdfs.

This bug can be reproduced on master and all other Spark 2 branches.
To reproduce:
# Create a file like "test{00-1}.txt" on a local directory (like in 
/Users/reza/test/test{00-1}.txt)
# Run spark-shell
# Execute this command:
val 
df=spark.read.option("header","false").csv("/Users/reza/test/test{00-01}.txt")

You will see the following stack trace:
{noformat}
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/Users/reza/test/test\{00-01\}.txt;
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
  ... 48 elided
{noformat}




> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19340
>                 URL: https://issues.apache.org/jira/browse/SPARK-19340
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>            Reporter: Reza Safi
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/rezasafi/bck/sp2/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

Reply via email to