[
https://issues.apache.org/jira/browse/SPARK-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540325#comment-14540325
]
Dibyendu Bhattacharya commented on SPARK-7525:
----------------------------------------------
I guess this is something to do with the lack of Tachyon Append Support.
java.lang.IllegalStateException: File exists and there is no append support!
at
org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:33)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream$lzycompute(FileBasedWriteAheadLogWriter.scala:33)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream(FileBasedWriteAheadLogWriter.scala:33)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.<init>(FileBasedWriteAheadLogWriter.scala:41)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:194)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:81)
at
org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:44)
at
org.apache.spark.streaming.receiver.WriteAheadLogBasedBlockHandler$$anonfun$5.apply(ReceivedBlockHandler.scala:178)
at
org.apache.spark.streaming.receiver.WriteAheadLogBasedBlockHandler$$anonfun$5.apply(ReceivedBlockHandler.scala:178)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
> Could not read data from write ahead log record when Receiver failed and WAL
> is stored in Tachyon
> -------------------------------------------------------------------------------------------------
>
> Key: SPARK-7525
> URL: https://issues.apache.org/jira/browse/SPARK-7525
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.4.0
> Environment: AWS , Spark Streaming 1.4 with Tachyon 0.6.4
> Reporter: Dibyendu Bhattacharya
>
> I was testing Fault Tolerant aspect of Spark Streaming when Checkpoint
> directory is stored in Tachyon. Spark Streaming is able to recover from
> Driver failure , but when Receiver Failed, Spark Streaming not able read the
> WAL files written by failed Receiver. Below is exception when Receiver is
> failed .
> INFO : org.apache.spark.scheduler.DAGScheduler - Executor lost: 2 (epoch 1)
> INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Trying to remove
> executor 2 from BlockManagerMaster.
> INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Removing block
> manager BlockManagerId(2, 10.252.5.54, 45789)
> INFO : org.apache.spark.storage.BlockManagerMaster - Removed 2 successfully
> in removeExecutor
> INFO : org.apache.spark.streaming.scheduler.ReceiverTracker - Registered
> receiver for stream 2 from 10.252.5.62:47255
> WARN : org.apache.spark.scheduler.TaskSetManager - Lost task 2.1 in stage
> 103.0 (TID 421, 10.252.5.62): org.apache.spark.SparkException: Could not read
> data from write ahead log record
> FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:144)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
> at scala.Option.getOrElse(Option.scala:120)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IllegalArgumentException: Seek position is past EOF:
> 645603894, fileSize = 0
> at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:239)
> at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
> at
> org.apache.spark.streaming.util.FileBasedWriteAheadLogRandomReader.read(FileBasedWriteAheadLogRandomReader.scala:37)
> at
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:104)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:141)
> ... 15 more
> INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 2.2 in stage
> 103.0 (TID 422, 10.252.5.61, ANY, 1909 bytes)
> INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 2.2 in stage
> 103.0 (TID 422) on executor 10.252.5.61: org.apache.spark.SparkException
> (Could not read data from write ahead log record
> FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919))
> [duplicate 1]
> INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 2.3 in stage
> 103.0 (TID 423, 10.252.5.62, ANY, 1909 bytes)
> INFO : org.apache.spark.deploy.client.AppClient$ClientActor - Executor
> updated: app-20150511104442-0048/2 is now LOST (worker lost)
> INFO : org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend -
> Executor app-20150511104442-0048/2 removed: worker lost
> ERROR: org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend - Asked
> to remove non-existent executor 2
> INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 2.3 in stage
> 103.0 (TID 423) on executor 10.252.5.62: org.apache.spark.SparkException
> (Could not read data from write ahead log record
> FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919))
> [duplicate 2]
> ERROR: org.apache.spark.scheduler.TaskSetManager - Task 2 in stage 103.0
> failed 4 times; aborting job
> INFO : org.apache.spark.scheduler.TaskSchedulerImpl - Removed TaskSet 103.0,
> whose tasks have all completed, from pool
> INFO : org.apache.spark.scheduler.TaskSchedulerImpl - Cancelling stage 103
> INFO : org.apache.spark.scheduler.DAGScheduler - ResultStage 103 (foreachRDD
> at Consumer.java:92) failed in 0.943 s
> INFO : org.apache.spark.scheduler.DAGScheduler - Job 120 failed: foreachRDD
> at Consumer.java:92, took 0.953482 s
> ERROR: org.apache.spark.streaming.scheduler.JobScheduler - Error running job
> streaming job 1431341145000 ms.0
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in
> stage 103.0 failed 4 times, most recent failure: Lost task 2.3 in stage 103.0
> (TID 423, 10.252.5.62): org.apache.spark.SparkException: Could not read data
> from write ahead log record
> FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:144)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
> at scala.Option.getOrElse(Option.scala:120)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IllegalArgumentException: Seek position is past EOF:
> 645603894, fileSize = 0
> at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:239)
> at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
> at
> org.apache.spark.streaming.util.FileBasedWriteAheadLogRandomReader.read(FileBasedWriteAheadLogRandomReader.scala:37)
> at
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:104)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:141)
> ... 15 more
> Driver stacktrace:
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Task 2 in stage 103.0 failed 4 times, most recent failure:
> Lost task 2.3 in stage 103.0 (TID 423, 10.252.5.62):
> org.apache.spark.SparkException: Could not read data from write ahead log
> record
> FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:144)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
> at scala.Option.getOrElse(Option.scala:120)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IllegalArgumentException: Seek position is past EOF:
> 645603894, fileSize = 0
> at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:239)
> at
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
> at
> org.apache.spark.streaming.util.FileBasedWriteAheadLogRandomReader.read(FileBasedWriteAheadLogRandomReader.scala:37)
> at
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:104)
> at
> org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:141)
> ... 15 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]