[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566349#comment-15566349 ]
Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:34 PM: -------------------------------------------------------------------- _Note that I am entirely new to the process of submitting issues on this system: if this needs to be a new issue, I would appreciate someone letting me know._ A bug very similar to this one is 100% reproducible across multiple machines, running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running under Spark 2.0.1. It occurs * in Scala, but not Python (have not tried R) * only when reading CSV files (and not, for example, when reading Parquet files) * only when running local, not submitted to a cluster This program will produce the bug (if {{poemData}} is defined per the commented-out section, rather than being read from a CSV file, the bug does not occur): {code} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ object SparkBugDemo { def main(args: Array[String]): Unit = { val poemSchema = StructType( Seq( StructField("label",IntegerType), StructField("line",StringType) ) ) val sparkSession = SparkSession.builder() .appName("Spark Bug Demonstration") .master("local[*]") .getOrCreate() // val poemData = sparkSession.createDataFrame(Seq( // (0, "There's many a strong farmer"), // (0, "Who's heart would break in two"), // (1, "If he could see the townland"), // (1, "That we are riding to;") // )).toDF("label", "line") val poemData = sparkSession.read .option("quote", value="") .schema(poemSchema) .csv(args(0)) println(s"Record count: ${poemData.count()}") } } {code} Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string pairs, as in: {noformat} 0,There's many a strong farmer 0,Who's heart would break in two 1,If he could see the townland 1,That we are riding to; {noformat} was (Author: jerome.scheuring): _Note that I am entirely new to the process of submitting issues on this system: if this needs to be a new issue, I would appreciate someone letting me know._ A bug very similar to this one is 100% reproducible across multiple machines, running both Windows 8.1 and Windows 10. It occurs * in Scala, but not Python (have not tried R) * only when reading CSV files (and not, for example, when reading Parquet files) * only when running local, not submitted to a cluster This program will produce the bug (if {{poemData}} is defined per the commented-out section, rather than being read from a CSV file, the bug does not occur): {code} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ object SparkBugDemo { def main(args: Array[String]): Unit = { val poemSchema = StructType( Seq( StructField("label",IntegerType), StructField("line",StringType) ) ) val sparkSession = SparkSession.builder() .appName("Spark Bug Demonstration") .master("local[*]") .getOrCreate() // val poemData = sparkSession.createDataFrame(Seq( // (0, "There's many a strong farmer"), // (0, "Who's heart would break in two"), // (1, "If he could see the townland"), // (1, "That we are riding to;") // )).toDF("label", "line") val poemData = sparkSession.read .option("quote", value="") .schema(poemSchema) .csv(args(0)) println(s"Record count: ${poemData.count()}") } } {code} Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string pairs, as in: {noformat} 0,There's many a strong farmer 0,Who's heart would break in two 1,If he could see the townland 1,That we are riding to; {noformat} > Spark failed to delete temp directory > -------------------------------------- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls > Reporter: stefan > Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org