[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

saurabh paliwal (JIRA) Tue, 17 May 2016 11:52:11 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287256#comment-15287256
 ]


saurabh paliwal commented on SPARK-15044:
-----------------------------------------

Anyway, as a work-around, I have just caught the exception for now, which works 
fine.
In org.apache.spark.rdd.HadoopRDD.scala, 

{code}
override def getPartitions: Array[Partition] = {
    val jobConf = getJobConf()
    // add the credentials here as this can be called before SparkContext 
initialized
    SparkHadoopUtil.get.addCredentials(jobConf)
    try {
      val inputFormat = getInputFormat(jobConf)
      val inputSplits = inputFormat.getSplits(jobConf, minPartitions)
      val array = new Array[Partition](inputSplits.size)
      for (i <- 0 until inputSplits.size) {
        array(i) = new HadoopPartition(id, i, inputSplits(i))
      }
      array
    }catch {
      case e: Exception =>
        logWarning("getSplits returned no splits for given input", e)
        new Array[Partition](0)
      }
  }
{code}

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15044
>                 URL: https://issues.apache.org/jira/browse/SPARK-15044
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr ..../warehouse/..../test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ...../test/p=1
>         at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>         at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>         at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
>         at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>         at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>         at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

Reply via email to