Imran Rashid created SPARK-17676:
------------------------------------

             Summary: FsHistoryProvider should ignore hidden files
                 Key: SPARK-17676
                 URL: https://issues.apache.org/jira/browse/SPARK-17676
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Imran Rashid
            Assignee: Imran Rashid
            Priority: Minor


FsHistoryProvider currently reads hidden files (beginning with ".") from the 
log dir.  However, it is writing a hidden file *itself* to that dir, which 
cannot be parsed, as part of a trick to find the scan time according to the 
file system:

{code}
    val fileName = "." + UUID.randomUUID().toString
    val path = new Path(logDir, fileName)
    val fos = fs.create(path)
{code}

It does delete the tmp file immediately, but we've seen cases where that race 
ends badly, and there is a logged error.  The error is harmless (the log file 
is ignored and spark moves on to the other log files), but the logged error is 
very confusing for users, so we should avoid it.

{noformat}
2016-09-26 09:10:03,016 ERROR 
org.apache.spark.deploy.history.FsHistoryProvider: Exception encountered when 
attempting to load application log 
hdfs://XXX/user/spark/applicationHistory/.3a5e987c-ace5-4568-9ccd-6285010e399a 
java.lang.IllegalArgumentException: Codec 
[3a5e987c-ace5-4568-9ccd-6285010e399a] is not available. Consider setting 
spark.io.compression.codec=lzf 
at 
org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:72)
 
at 
org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:72)
 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:72) 
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$1.apply(EventLoggingListener.scala:309)
 
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$1.apply(EventLoggingListener.scala:309)
 
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) 
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) 
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:309)
 
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:308)
 
at scala.Option.map(Option.scala:145) 
at 
org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:308)
 
at 
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:518)
 
at 
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:359)
 
at 
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:356)
 
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
 
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
 
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) 
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) 
at 
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:356)
at 
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$4.run(FsHistoryProvider.scala:277)
 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to