[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566349#comment-15566349
 ] 

Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:34 PM:
--------------------------------------------------------------------

_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

    val poemSchema = StructType(
      Seq(
        StructField("label",IntegerType), 
        StructField("line",StringType)
      )
    )

    val sparkSession = SparkSession.builder()
      .appName("Spark Bug Demonstration")
      .master("local[*]")
      .getOrCreate()

//    val poemData = sparkSession.createDataFrame(Seq(
//      (0, "There's many a strong farmer"),
//      (0, "Who's heart would break in two"),
//      (1, "If he could see the townland"),
//      (1, "That we are riding to;")
//    )).toDF("label", "line")

    val poemData = sparkSession.read
      .option("quote", value="")
      .schema(poemSchema)
      .csv(args(0))

    println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}


was (Author: jerome.scheuring):
_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

    val poemSchema = StructType(
      Seq(
        StructField("label",IntegerType), 
        StructField("line",StringType)
      )
    )

    val sparkSession = SparkSession.builder()
      .appName("Spark Bug Demonstration")
      .master("local[*]")
      .getOrCreate()

//    val poemData = sparkSession.createDataFrame(Seq(
//      (0, "There's many a strong farmer"),
//      (0, "Who's heart would break in two"),
//      (1, "If he could see the townland"),
//      (1, "That we are riding to;")
//    )).toDF("label", "line")

    val poemData = sparkSession.read
      .option("quote", value="")
      .schema(poemSchema)
      .csv(args(0))

    println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --------------------------------------
>
>                 Key: SPARK-12216
>                 URL: https://issues.apache.org/jira/browse/SPARK-12216
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>         Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>            Reporter: stefan
>            Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at scala.util.Try$.apply(Try.scala:161)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>         at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to