[ https://issues.apache.org/jira/browse/SPARK-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990025#comment-14990025 ]
Neil Jonkers commented on SPARK-5068: ------------------------------------- We stil see this issue on Spark 1.5: hive can handle the missing partition but not spark-sql: HDFS: $ hdfs dfs -lsr /data lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-xr-x - hadoop hadoop 0 2015-11-04 17:40 /data/year=2015 drwxr-xr-x - hadoop hadoop 0 2015-11-04 17:35 /data/year=2015/month=10 -rw-r--r-- 1 hadoop hadoop 20 2015-11-04 17:35 /data/year=2015/month=10/names Spark: 15/11/04 17:47:46 INFO ParseDriver: Parsing command: select * from th 15/11/04 17:47:46 INFO ParseDriver: Parse Completed 15/11/04 17:47:46 INFO MemoryStore: ensureFreeSpace(481656) called with curMem=9466, maxMem=560993402 15/11/04 17:47:46 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 470.4 KB, free 534.5 MB) 15/11/04 17:47:46 INFO MemoryStore: ensureFreeSpace(45219) called with curMem=491122, maxMem=560993402 15/11/04 17:47:46 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 44.2 KB, free 534.5 MB) 15/11/04 17:47:46 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.43.193.77:57944 (size: 44.2 KB, free: 535.0 MB) 15/11/04 17:47:46 INFO SparkContext: Created broadcast 3 from processCmd at CliDriver.java:376 15/11/04 17:47:46 INFO FileInputFormat: Total input paths to process : 1 15/11/04 17:47:46 ERROR SparkSQLDriver: Failed in [select * from th] org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ip-10-43-193-77.ec2.internal:8020/data/year=2015/month=11 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:200) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:279) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) > When the path not found in the hdfs,we can't get the result > ----------------------------------------------------------- > > Key: SPARK-5068 > URL: https://issues.apache.org/jira/browse/SPARK-5068 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Reporter: jeanlyn > Assignee: dongxu > Fix For: 1.4.0 > > > when the partion path was found in the metastore but not found in the hdfs,it > will casue some problems as follow: > {noformat} > hive> show partitions partition_test; > OK > dt=1 > dt=2 > dt=3 > dt=4 > Time taken: 0.168 seconds, Fetched: 4 row(s) > {noformat} > {noformat} > hive> dfs -ls /user/jeanlyn/warehouse/partition_test; > Found 3 items > drwxr-xr-x - jeanlyn supergroup 0 2014-12-02 16:29 > /user/jeanlyn/warehouse/partition_test/dt=1 > drwxr-xr-x - jeanlyn supergroup 0 2014-12-02 16:29 > /user/jeanlyn/warehouse/partition_test/dt=3 > drwxr-xr-x - jeanlyn supergroup 0 2014-12-02 17:42 > /user/jeanlyn/warehouse/partition_test/dt=4 > {noformat} > when i run the sql > {noformat} > select * from partition_test limit 10 > {noformat} in *hive*,i got no problem,but when i run in *spark-sql* i get > the error as follow: > {noformat} > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: > hdfs://jeanlyn:9000/user/jeanlyn/warehouse/partition_test/dt=2 > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) > at org.apache.spark.rdd.RDD.collect(RDD.scala:780) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84) > at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) > at org.apache.spark.sql.hive.testpartition$.main(test.scala:23) > at org.apache.spark.sql.hive.testpartition.main(test.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org