[ https://issues.apache.org/jira/browse/SPARK-14695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liang Lee updated SPARK-14695: ------------------------------ Description: When running a PageRank job through the default examples, e.g., the class 'org.apache.spark.examples.graphx.Analytics' in spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors: 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 9.0 (TID 66) in 1662 ms on R1S1 (1/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 9.0 (TID 73) in 1663 ms on R1S1 (2/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 9.0 (TID 70) in 1672 ms on R1S1 (3/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 9.0 (TID 69) in 1680 ms on R1S1 (4/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 9.0 (TID 72) in 1678 ms on R1S1 (5/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 9.0 (TID 67) in 1682 ms on R1S1 (6/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 9.0 (TID 75) in 1710 ms on R1S1 (7/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 9.0 (TID 74) in 1729 ms on R1S1 (8/10) 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 9.0 (TID 68) in 1838 ms on R1S1 (9/10) 16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 9.0 (TID 71, R1S1): java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -1 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/04/18 03:17:25 INFO scheduler.TaskSetManager: Starting task 6.1 in stage 9.0 (TID 76, R1S1, partition 6,PROCESS_LOCAL, 2171 bytes) 16/04/18 03:17:25 DEBUG hdfs.DFSClient: DataStreamer block BP-1194875811-10.3.1.3-1460617951862:blk_1073742842_2018 sending packet packet seqno:-1 offsetInBlock:0 lastPacketInBlock:false lastByteOffsetInBlock: 0 16/04/18 03:17:25 DEBUG hdfs.DFSClient: DFSClient seqno: -1 status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 653735 16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.1 in stage 9.0 (TID 76, R1S1): org.apache.spark.storage.BlockException: Block manager failed to return cached value for rdd_28_6! at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) We use the following script to submit the job: /Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class org.apache.spark.examples.graphx.Analytics /Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res --numEPart=10 --numIter=1 --edgeStorageLevel=OFF_HEAP --vertexStorageLevel=OFF_HEAP When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error and the job can finished correctly. But when we set the storage level to OFF_HEAP, which means using Tachyon for the storage process, the error occurs. The executors stack is like this, seems the write block to Tahcyon failed. 116/04/18 03:16:54 INFO : Connecting local worker @ R1S1/10.3.1.1:29998 16/04/18 03:16:55 INFO : Connecting local worker @ R1S1/10.3.1.1:29998 16/04/18 03:16:55 INFO : Folder /mnt/ramdisk/tachyonworker/5291680305620365611 was created! 16/04/18 03:16:55 INFO : LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/5291680305620365611/4076863488 16/04/18 03:16:55 INFO Executor: Finished task 6.0 in stage 6.0 (TID 59). 2973 bytes result sent to driver 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in putValues(rdd_28_6) java.io.IOException: Fail to cache: null at tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276) at tachyon.client.file.FileOutStream.close(FileOutStream.java:165) at org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126) at org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79) at org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820) at tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108) at tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92) at tachyon.client.file.FileOutStream.close(FileOutStream.java:160) ... 18 more 16/04/18 03:17:25 INFO : Connecting local worker @ R1S1/10.3.1.1:29998 16/04/18 03:17:25 WARN : Worker Client last execution took 29146 ms. Longer than the interval 1000 16/04/18 03:17:25 ERROR Executor: Exception in task 6.0 in stage 9.0 (TID 71) java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -1 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/04/18 03:17:25 INFO CoarseGrainedExecutorBackend: Got assigned task 76 16/04/18 03:17:25 INFO Executor: Running task 6.1 in stage 9.0 (TID 76) 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6) java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 is not a valid block. at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447) at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126) at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212) at org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152) at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) at scala.Option.flatMap(Option.scala:170) at org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147) at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486) at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 is not a valid block. at tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151) at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444) ... 22 more 16/04/18 03:17:25 INFO CacheManager: Partition rdd_28_6 not found, computing it 16/04/18 03:17:25 INFO BlockManager: Found block rdd_18_6 locally 16/04/18 03:17:25 WARN BlockManager: Block rdd_28_6 already exists on this machine; not re-adding it 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6) java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 is not a valid block. at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447) at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126) at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212) at org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152) at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) at scala.Option.flatMap(Option.scala:170) at org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147) at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486) at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 is not a valid block. at tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151) at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444) ... 23 more 16/04/18 03:17:25 INFO CacheManager: Failure to store rdd_28_6 16/04/18 03:17:25 ERROR Executor: Exception in task 6.1 in stage 9.0 (TID 76) org.apache.spark.storage.BlockException: Block manager failed to return cached value for rdd_28_6! at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) And when we check the Storeag page of job UI, we find some blocks are not cached: RDD Name Storage Level Cached Partitions Fraction Cached Size in Memory Size in ExternalBlockStore Size on Disk EdgeRDD ExternalBlockStore Serialized 1x Replicated 18 90% 0.0 B 753.2 MB 0.0 B VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 191.1 MB 0.0 B VertexRDD, VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 121.1 MB 0.0 B VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 121.3 MB 0.0 B GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt) ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 871.8 MB 0.0 B EdgeRDD ExternalBlockStore Serialized 1x Replicated 18 90% 0.0 B 1307.6 MB 0.0 B I have check the onfiguration many times. Can anyone give me some ideas about this issue. I has puzzled me for 2 weeks. Thanks was: When running a PageRank job through the default examples, e.g., the class 'org.apache.spark.examples.graphx.Analytics' in spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors: 16/04/18 02:30:01 WARN scheduler.TaskSetManager: Lost task 9.0 in stage 6.0 (TID 53, R1S1): java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -1 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala: 645) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:15 3) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal a:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal a:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:615) at java.lang.Thread.run(Thread.java:745) We use the following script to submit the job: /Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class org.apache.spark.examples.graphx.Analytics /Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res --numEPart=10 --numIter=1 --edgeStorageLevel=OFF_HEAP --vertexStorageLevel=OFF_HEAP When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error and the job can finished correctly. But when we set the storage level to OFF_HEAP, which means using Tachyon for the storage process, the error occurs. The executors stack is like this, seems the write block to Tahcyon failed. 16/04/18 02:25:54 ERROR ExternalBlockStore: Error in putValues(rdd_20_1) java.io.IOException: Fail to cache: null at tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276) at tachyon.client.file.FileOutStream.close(FileOutStream.java:165) at org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126) at org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79) at org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820) at tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108) at tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92) at tachyon.client.file.FileOutStream.close(FileOutStream.java:160) ... 31 more 16/04/18 02:25:54 ERROR Executor: Exception in task 1.0 in stage 10.0 (TID 142) java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -1 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/04/18 02:25:54 INFO CoarseGrainedExecutorBackend: Got assigned task 160 16/04/18 02:25:54 INFO Executor: Running task 1.1 in stage 10.0 (TID 160) 16/04/18 02:25:54 INFO CacheManager: Partition rdd_30_1 not found, computing it 16/04/18 02:25:54 ERROR ExternalBlockStore: Error in getValues(rdd_20_1) java.io.IOException: tachyon.exception.TachyonException: FileId 1660944383 BlockIndex 0 is not a valid block. at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447) at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126) at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212) at org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152) at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) at org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) at scala.Option.flatMap(Option.scala:170) at org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147) at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486) at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: tachyon.exception.TachyonException: FileId 1660944383 BlockIndex 0 is not a valid block. at tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151) at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444) ... 35 more 16/04/18 02:25:54 INFO CacheManager: Partition rdd_20_1 not found, computing it 16/04/18 02:25:54 INFO BlockManager: Found block rdd_3_1 locally 16/04/18 02:25:54 WARN BlockManager: Block rdd_20_1 already exists on this machine; not re-adding it And when we check the Storeag page of job UI, we find some blocks are not cached: RDD Name Storage Level Cached Partitions Fraction Cached Size in Memory Size in ExternalBlockStore Size on Disk EdgeRDD ExternalBlockStore Serialized 1x Replicated 18 90% 0.0 B 753.2 MB 0.0 B VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 191.1 MB 0.0 B VertexRDD, VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 121.1 MB 0.0 B VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 121.3 MB 0.0 B GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt) ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 871.8 MB 0.0 B EdgeRDD ExternalBlockStore Serialized 1x Replicated 18 90% 0.0 B 1307.6 MB 0.0 B I have check the onfiguration many times. Can anyone give me some ideas about this issue. I has puzzled me for 2 weeks. Thanks > Error occurs when using OFF_HEAP persistent level > -------------------------------------------------- > > Key: SPARK-14695 > URL: https://issues.apache.org/jira/browse/SPARK-14695 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core > Affects Versions: 1.6.0 > Environment: Spark 1.6.0 > Tachyon 0.8.2 > Hadoop 2.6.0 > Reporter: Liang Lee > > When running a PageRank job through the default examples, e.g., the class > 'org.apache.spark.examples.graphx.Analytics' in > spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors: > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 2.0 in stage > 9.0 (TID 66) in 1662 ms on R1S1 (1/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 8.0 in stage > 9.0 (TID 73) in 1663 ms on R1S1 (2/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 5.0 in stage > 9.0 (TID 70) in 1672 ms on R1S1 (3/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage > 9.0 (TID 69) in 1680 ms on R1S1 (4/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 7.0 in stage > 9.0 (TID 72) in 1678 ms on R1S1 (5/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 1.0 in stage > 9.0 (TID 67) in 1682 ms on R1S1 (6/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 4.0 in stage > 9.0 (TID 75) in 1710 ms on R1S1 (7/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 3.0 in stage > 9.0 (TID 74) in 1729 ms on R1S1 (8/10) > 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 9.0 in stage > 9.0 (TID 68) in 1838 ms on R1S1 (9/10) > 16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 9.0 > (TID 71, R1S1): java.lang.IllegalArgumentException: requirement failed: > sizeInBytes was negative: -1 > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822) > at > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/04/18 03:17:25 INFO scheduler.TaskSetManager: Starting task 6.1 in stage > 9.0 (TID 76, R1S1, partition 6,PROCESS_LOCAL, 2171 bytes) > 16/04/18 03:17:25 DEBUG hdfs.DFSClient: DataStreamer block > BP-1194875811-10.3.1.3-1460617951862:blk_1073742842_2018 sending packet > packet seqno:-1 offsetInBlock:0 lastPacketInBlock:false > lastByteOffsetInBlock: 0 > 16/04/18 03:17:25 DEBUG hdfs.DFSClient: DFSClient seqno: -1 status: SUCCESS > status: SUCCESS downstreamAckTimeNanos: 653735 > 16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.1 in stage 9.0 > (TID 76, R1S1): org.apache.spark.storage.BlockException: Block manager failed > to return cached value for rdd_28_6! > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > We use the following script to submit the job: > /Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class > org.apache.spark.examples.graphx.Analytics > /Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar > pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res > --numEPart=10 --numIter=1 --edgeStorageLevel=OFF_HEAP > --vertexStorageLevel=OFF_HEAP > When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error > and the job can finished correctly. > But when we set the storage level to OFF_HEAP, which means using Tachyon for > the storage process, the error occurs. > The executors stack is like this, seems the write block to Tahcyon failed. > 116/04/18 03:16:54 INFO : Connecting local worker @ R1S1/10.3.1.1:29998 > 16/04/18 03:16:55 INFO : Connecting local worker @ R1S1/10.3.1.1:29998 > 16/04/18 03:16:55 INFO : Folder > /mnt/ramdisk/tachyonworker/5291680305620365611 was created! > 16/04/18 03:16:55 INFO : LocalBlockOutStream created new file block, block > path: /mnt/ramdisk/tachyonworker/5291680305620365611/4076863488 > 16/04/18 03:16:55 INFO Executor: Finished task 6.0 in stage 6.0 (TID 59). > 2973 bytes result sent to driver > 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in putValues(rdd_28_6) > java.io.IOException: Fail to cache: null > at > tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276) > at tachyon.client.file.FileOutStream.close(FileOutStream.java:165) > at > org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126) > at > org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79) > at > org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798) > at > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820) > at > tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108) > at > tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92) > at tachyon.client.file.FileOutStream.close(FileOutStream.java:160) > ... 18 more > 16/04/18 03:17:25 INFO : Connecting local worker @ R1S1/10.3.1.1:29998 > 16/04/18 03:17:25 WARN : Worker Client last execution took 29146 ms. Longer > than the interval 1000 > 16/04/18 03:17:25 ERROR Executor: Exception in task 6.0 in stage 9.0 (TID 71) > java.lang.IllegalArgumentException: requirement failed: sizeInBytes was > negative: -1 > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822) > at > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/04/18 03:17:25 INFO CoarseGrainedExecutorBackend: Got assigned task 76 > 16/04/18 03:17:25 INFO Executor: Running task 6.1 in stage 9.0 (TID 76) > 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6) > java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 > BlockIndex 0 is not a valid block. > at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447) > at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126) > at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212) > at > org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152) > at > org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) > at > org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) > at scala.Option.flatMap(Option.scala:170) > at > org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147) > at > org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486) > at > org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 > is not a valid block. > at > tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151) > at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444) > ... 22 more > 16/04/18 03:17:25 INFO CacheManager: Partition rdd_28_6 not found, computing > it > 16/04/18 03:17:25 INFO BlockManager: Found block rdd_18_6 locally > 16/04/18 03:17:25 WARN BlockManager: Block rdd_28_6 already exists on this > machine; not re-adding it > 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6) > java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 > BlockIndex 0 is not a valid block. > at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447) > at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126) > at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212) > at > org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152) > at > org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) > at > org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147) > at scala.Option.flatMap(Option.scala:170) > at > org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147) > at > org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486) > at > org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420) > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625) > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 > is not a valid block. > at > tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151) > at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444) > ... 23 more > 16/04/18 03:17:25 INFO CacheManager: Failure to store rdd_28_6 > 16/04/18 03:17:25 ERROR Executor: Exception in task 6.1 in stage 9.0 (TID 76) > org.apache.spark.storage.BlockException: Block manager failed to return > cached value for rdd_28_6! > at > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > And when we check the Storeag page of job UI, we find some blocks are not > cached: > RDD Name Storage Level Cached Partitions Fraction Cached Size in > Memory Size in ExternalBlockStore Size on Disk > EdgeRDD ExternalBlockStore Serialized 1x Replicated 18 90% > 0.0 B 753.2 MB 0.0 B > VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% > 0.0 B 191.1 MB 0.0 B > VertexRDD, VertexRDD ExternalBlockStore Serialized 1x Replicated 20 > 100% 0.0 B 121.1 MB 0.0 B > VertexRDD ExternalBlockStore Serialized 1x Replicated 20 100% > 0.0 B 121.3 MB 0.0 B > GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt) > ExternalBlockStore Serialized 1x Replicated 20 100% 0.0 B 871.8 > MB 0.0 B > EdgeRDD ExternalBlockStore Serialized 1x Replicated 18 90% > 0.0 B 1307.6 MB 0.0 B > I have check the onfiguration many times. Can anyone give me some ideas about > this issue. I has puzzled me for 2 weeks. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org