Oh sorry, I misread your question. I thought you were trying something
like |parquetFile(“s3n://file1,hdfs://file2”)|. Yeah, it’s a valid bug.
Thanks for opening the JIRA ticket and the PR!
Cheng
On 3/16/15 6:39 PM, Cheng Lian wrote:
Hi Pei-Lun,
We intentionally disallowed passing multiple comma separated paths in
1.3.0. One of the reason is that users report that this fail when a
file path contain an actual comma in it. In your case, you may do
something like this:
|val s3nDF = parquetFile("s3n://...
")
val hdfsDF = parquetFile("hdfs://...")
val finalDF = s3nDF.union(finalDF)
|
Cheng
On 3/16/15 4:03 PM, Pei-Lun Lee wrote:
Hi,
I am using Spark 1.3.0, where I cannot load parquet files from more than
one file system, say one s3n://... and another hdfs://..., which worked in
older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3.
One way to fix this is instead of get a single FileSystem from default
configuration in ParquetRelation2, call Path.getFileSystem for each path.
Here's the JIRA link and pull request:
https://issues.apache.org/jira/browse/SPARK-6351
https://github.com/apache/spark/pull/5039
Thanks,
--
Pei-Lun