Imran Rashid created SPARK-17676:
------------------------------------
Summary: FsHistoryProvider should ignore hidden files
Key: SPARK-17676
URL: https://issues.apache.org/jira/browse/SPARK-17676
Project: Spark
Issue Type: Bug
Components: Spark Core
Reporter: Imran Rashid
Assignee: Imran Rashid
Priority: Minor
FsHistoryProvider currently reads hidden files (beginning with ".") from the
log dir. However, it is writing a hidden file *itself* to that dir, which
cannot be parsed, as part of a trick to find the scan time according to the
file system:
{code}
val fileName = "." + UUID.randomUUID().toString
val path = new Path(logDir, fileName)
val fos = fs.create(path)
{code}
It does delete the tmp file immediately, but we've seen cases where that race
ends badly, and there is a logged error. The error is harmless (the log file
is ignored and spark moves on to the other log files), but the logged error is
very confusing for users, so we should avoid it.
{noformat}
2016-09-26 09:10:03,016 ERROR
org.apache.spark.deploy.history.FsHistoryProvider: Exception encountered when
attempting to load application log
hdfs://XXX/user/spark/applicationHistory/.3a5e987c-ace5-4568-9ccd-6285010e399a
java.lang.IllegalArgumentException: Codec
[3a5e987c-ace5-4568-9ccd-6285010e399a] is not available. Consider setting
spark.io.compression.codec=lzf
at
org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:72)
at
org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:72)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:72)
at
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$1.apply(EventLoggingListener.scala:309)
at
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$1.apply(EventLoggingListener.scala:309)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:309)
at
org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:308)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:308)
at
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:518)
at
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:359)
at
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:356)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:356)
at
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$4.run(FsHistoryProvider.scala:277)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]