It looks like this has been broken around Spark 1.5. Please see JIRA SPARK-10185. This has been fixed in pyspark but unfortunately SparkR was missed. I have confirmed this is still broken in Spark 1.6. Could you please open a JIRA?
On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3" <tomas.rodrig...@transvoyant.com> wrote: Hello, I believe to have encountered a bug with Spark 1.5.2. I am using RStudio and SparkR to read in JSON files with jsonFile(sqlContext, "path"). If "path" is a single path (e.g., "/path/to/dir0"), then it works fine; but, when "path" is a vector of paths (e.g. path <- c("/path/to/dir1","/path/to/dir2"), then I get the following error message: > raw.terror<-jsonFile(sqlContext,path) 15/12/03 15:59:55 ERROR RBackendHandler: jsonFile on 1 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : java.io.IOException: No input paths specified in job at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2 Note that passing a vector of paths in Spark-1.4.1 works just fine. Any help is greatly appreciated if this is not a bug and perhaps an environment or different issue. Best, T -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-in-Spark-1-5-2-jsonFile-Bug-Found-tp25560.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org