[
https://issues.apache.org/jira/browse/SPARK-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711256#comment-14711256
]
Apache Spark commented on SPARK-10185:
--------------------------------------
User 'koertkuipers' has created a pull request for this issue:
https://github.com/apache/spark/pull/8416
> Spark SQL does not handle comma separates paths on Hadoop FileSystem
> --------------------------------------------------------------------
>
> Key: SPARK-10185
> URL: https://issues.apache.org/jira/browse/SPARK-10185
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.1
> Reporter: koert kuipers
>
> Spark SQL uses a Map[String, String] for data source settings. As a
> consequence the only way to pass in multiple paths (something that hadoop
> file input format supports) is to do pass in a comma separated list. For
> example:
> sqlContext.format("json").load("dir1,dir22")
> or
> sqlContext.format("json").option("path", "dir1,dir2").load
> However in this case ResolvedDataSource does not handle the comma delimited
> paths correctly for a HadoopFsRelationProvider. It treats the multiple comma
> delimited paths as single path.
> For example if i pass in for path "dir1,dir2" it will make dir1 qualified but
> ignore dir2 (presumably because it simply treats it as part of dir1). If
> globs are involved then it simply always returns an empty array of paths
> (because the glob with comma in it doesn’t match anything).
> I think its important to handle commas to pass in multiple paths, since the
> framework does not provide an alternative. In some cases like parquet the
> code simply bypasses ResolvedDataSource to support multiple paths but to me
> this is a workaround that should be discouraged.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]