[ 
https://issues.apache.org/jira/browse/SPARK-14695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Lee updated SPARK-14695:
------------------------------
    Description: 
When running a PageRank job through  the default examples, e.g., the class  
'org.apache.spark.examples.graphx.Analytics' in 
spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors:
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 9.0 
(TID 66) in 1662 ms on R1S1 (1/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 9.0 
(TID 73) in 1663 ms on R1S1 (2/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 9.0 
(TID 70) in 1672 ms on R1S1 (3/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 9.0 
(TID 69) in 1680 ms on R1S1 (4/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 9.0 
(TID 72) in 1678 ms on R1S1 (5/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 9.0 
(TID 67) in 1682 ms on R1S1 (6/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 9.0 
(TID 75) in 1710 ms on R1S1 (7/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 9.0 
(TID 74) in 1729 ms on R1S1 (8/10)
16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 9.0 
(TID 68) in 1838 ms on R1S1 (9/10)
16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 9.0 
(TID 71, R1S1): java.lang.IllegalArgumentException: requirement failed: 
sizeInBytes was negative: -1
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
        at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

16/04/18 03:17:25 INFO scheduler.TaskSetManager: Starting task 6.1 in stage 9.0 
(TID 76, R1S1, partition 6,PROCESS_LOCAL, 2171 bytes)
16/04/18 03:17:25 DEBUG hdfs.DFSClient: DataStreamer block 
BP-1194875811-10.3.1.3-1460617951862:blk_1073742842_2018 sending packet packet 
seqno:-1 offsetInBlock:0 lastPacketInBlock:false lastByteOffsetInBlock: 0
16/04/18 03:17:25 DEBUG hdfs.DFSClient: DFSClient seqno: -1 status: SUCCESS 
status: SUCCESS downstreamAckTimeNanos: 653735
16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.1 in stage 9.0 
(TID 76, R1S1): org.apache.spark.storage.BlockException: Block manager failed 
to return cached value for rdd_28_6!
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)



We use the following script to submit the job:
/Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class 
org.apache.spark.examples.graphx.Analytics 
/Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar 
pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res --numEPart=10 
--numIter=1 --edgeStorageLevel=OFF_HEAP --vertexStorageLevel=OFF_HEAP

When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error 
and the job can finished correctly.
But when we set the storage level to OFF_HEAP, which means using Tachyon for 
the storage process, the error occurs.

The executors stack is like this, seems the write block to Tahcyon failed.

116/04/18 03:16:54 INFO : Connecting local worker @ R1S1/10.3.1.1:29998
16/04/18 03:16:55 INFO : Connecting local worker @ R1S1/10.3.1.1:29998
16/04/18 03:16:55 INFO : Folder /mnt/ramdisk/tachyonworker/5291680305620365611 
was created!
16/04/18 03:16:55 INFO : LocalBlockOutStream created new file block, block 
path: /mnt/ramdisk/tachyonworker/5291680305620365611/4076863488
16/04/18 03:16:55 INFO Executor: Finished task 6.0 in stage 6.0 (TID 59). 2973 
bytes result sent to driver
16/04/18 03:17:25 ERROR ExternalBlockStore: Error in putValues(rdd_28_6)
java.io.IOException: Fail to cache: null
        at 
tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276)
        at tachyon.client.file.FileOutStream.close(FileOutStream.java:165)
        at 
org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126)
        at 
org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79)
        at 
org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798)
        at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820)
        at 
tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108)
        at 
tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92)
        at tachyon.client.file.FileOutStream.close(FileOutStream.java:160)
        ... 18 more
16/04/18 03:17:25 INFO : Connecting local worker @ R1S1/10.3.1.1:29998
16/04/18 03:17:25 WARN : Worker Client last execution took 29146 ms. Longer 
than the interval 1000
16/04/18 03:17:25 ERROR Executor: Exception in task 6.0 in stage 9.0 (TID 71)
java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
negative: -1
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
        at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
16/04/18 03:17:25 INFO CoarseGrainedExecutorBackend: Got assigned task 76
16/04/18 03:17:25 INFO Executor: Running task 6.1 in stage 9.0 (TID 76)
16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6)
java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 
BlockIndex 0 is not a valid block.
        at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447)
        at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126)
        at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212)
        at 
org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152)
        at 
org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
        at 
org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
        at scala.Option.flatMap(Option.scala:170)
        at 
org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147)
        at 
org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486)
        at 
org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420)
        at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 
is not a valid block.
        at 
tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151)
        at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444)
        ... 22 more
16/04/18 03:17:25 INFO CacheManager: Partition rdd_28_6 not found, computing it
16/04/18 03:17:25 INFO BlockManager: Found block rdd_18_6 locally
16/04/18 03:17:25 WARN BlockManager: Block rdd_28_6 already exists on this 
machine; not re-adding it
16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6)
java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 
BlockIndex 0 is not a valid block.
        at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447)
        at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126)
        at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212)
        at 
org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152)
        at 
org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
        at 
org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
        at scala.Option.flatMap(Option.scala:170)
        at 
org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147)
        at 
org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486)
        at 
org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420)
        at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 
is not a valid block.
        at 
tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151)
        at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444)
        ... 23 more
16/04/18 03:17:25 INFO CacheManager: Failure to store rdd_28_6
16/04/18 03:17:25 ERROR Executor: Exception in task 6.1 in stage 9.0 (TID 76)
org.apache.spark.storage.BlockException: Block manager failed to return cached 
value for rdd_28_6!
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

And when we check the Storeag page of job UI, we find some blocks are not 
cached:

RDD Name        Storage Level   Cached Partitions       Fraction Cached Size in 
Memory  Size in ExternalBlockStore      Size on Disk
EdgeRDD         ExternalBlockStore Serialized 1x Replicated     18      90%     
0.0 B   753.2 MB        0.0 B
VertexRDD       ExternalBlockStore Serialized 1x Replicated     20      100%    
0.0 B   191.1 MB        0.0 B
VertexRDD, VertexRDD    ExternalBlockStore Serialized 1x Replicated     20      
100%    0.0 B   121.1 MB        0.0 B
VertexRDD       ExternalBlockStore Serialized 1x Replicated     20      100%    
0.0 B   121.3 MB        0.0 B
GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt)   
ExternalBlockStore Serialized 1x Replicated     20      100%    0.0 B   871.8 
MB        0.0 B
EdgeRDD         ExternalBlockStore Serialized 1x Replicated     18      90%     
0.0 B   1307.6 MB       0.0 B

I have check the onfiguration many times. Can anyone give me some ideas about 
this issue. I has puzzled me for 2 weeks.
 Thanks

  was:
When running a PageRank job through  the default examples, e.g., the class  
'org.apache.spark.examples.graphx.Analytics' in 
spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors:
16/04/18 02:30:01 WARN scheduler.TaskSetManager: Lost task 9.0 in stage 6.0 
(TID                                                                        53, 
R1S1): java.lang.IllegalArgumentException: requirement failed: sizeInBytes      
                                                                  was negative: 
-1
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
        at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:           
                                                            645)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:15           
                                                            3)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:           
                                                            38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal           
                                                            a:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal           
                                                            a:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.           
                                                            java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor           
                                                            .java:615)
        at java.lang.Thread.run(Thread.java:745)


We use the following script to submit the job:
/Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class 
org.apache.spark.examples.graphx.Analytics 
/Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar 
pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res --numEPart=10 
--numIter=1 --edgeStorageLevel=OFF_HEAP --vertexStorageLevel=OFF_HEAP

When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error 
and the job can finished correctly.
But when we set the storage level to OFF_HEAP, which means using Tachyon for 
the storage process, the error occurs.

The executors stack is like this, seems the write block to Tahcyon failed.
16/04/18 02:25:54 ERROR ExternalBlockStore: Error in putValues(rdd_20_1)
java.io.IOException: Fail to cache: null
        at 
tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276)
        at tachyon.client.file.FileOutStream.close(FileOutStream.java:165)
        at 
org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126)
        at 
org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79)
        at 
org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798)
        at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820)
        at 
tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108)
        at 
tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92)
        at tachyon.client.file.FileOutStream.close(FileOutStream.java:160)
        ... 31 more
16/04/18 02:25:54 ERROR Executor: Exception in task 1.0 in stage 10.0 (TID 142)
java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
negative: -1
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
        at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
16/04/18 02:25:54 INFO CoarseGrainedExecutorBackend: Got assigned task 160
16/04/18 02:25:54 INFO Executor: Running task 1.1 in stage 10.0 (TID 160)
16/04/18 02:25:54 INFO CacheManager: Partition rdd_30_1 not found, computing it
16/04/18 02:25:54 ERROR ExternalBlockStore: Error in getValues(rdd_20_1)
java.io.IOException: tachyon.exception.TachyonException: FileId 1660944383 
BlockIndex 0 is not a valid block.
        at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447)
        at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126)
        at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212)
        at 
org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152)
        at 
org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
        at 
org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
        at scala.Option.flatMap(Option.scala:170)
        at 
org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147)
        at 
org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486)
        at 
org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420)
        at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.exception.TachyonException: FileId 1660944383 BlockIndex 0 
is not a valid block.
        at 
tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151)
        at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444)
        ... 35 more
16/04/18 02:25:54 INFO CacheManager: Partition rdd_20_1 not found, computing it
16/04/18 02:25:54 INFO BlockManager: Found block rdd_3_1 locally
16/04/18 02:25:54 WARN BlockManager: Block rdd_20_1 already exists on this 
machine; not re-adding it


And when we check the Storeag page of job UI, we find some blocks are not 
cached:

RDD Name        Storage Level   Cached Partitions       Fraction Cached Size in 
Memory  Size in ExternalBlockStore      Size on Disk
EdgeRDD         ExternalBlockStore Serialized 1x Replicated     18      90%     
0.0 B   753.2 MB        0.0 B
VertexRDD       ExternalBlockStore Serialized 1x Replicated     20      100%    
0.0 B   191.1 MB        0.0 B
VertexRDD, VertexRDD    ExternalBlockStore Serialized 1x Replicated     20      
100%    0.0 B   121.1 MB        0.0 B
VertexRDD       ExternalBlockStore Serialized 1x Replicated     20      100%    
0.0 B   121.3 MB        0.0 B
GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt)   
ExternalBlockStore Serialized 1x Replicated     20      100%    0.0 B   871.8 
MB        0.0 B
EdgeRDD         ExternalBlockStore Serialized 1x Replicated     18      90%     
0.0 B   1307.6 MB       0.0 B

I have check the onfiguration many times. Can anyone give me some ideas about 
this issue. I has puzzled me for 2 weeks.
 Thanks


> Error occurs when using OFF_HEAP persistent level 
> --------------------------------------------------
>
>                 Key: SPARK-14695
>                 URL: https://issues.apache.org/jira/browse/SPARK-14695
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 1.6.0
>         Environment: Spark 1.6.0
> Tachyon 0.8.2
> Hadoop 2.6.0
>            Reporter: Liang Lee
>
> When running a PageRank job through  the default examples, e.g., the class  
> 'org.apache.spark.examples.graphx.Analytics' in 
> spark-examples-1.6.0-hadoop2.6.0.jar package, we got the following erors:
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 
> 9.0 (TID 66) in 1662 ms on R1S1 (1/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 
> 9.0 (TID 73) in 1663 ms on R1S1 (2/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 
> 9.0 (TID 70) in 1672 ms on R1S1 (3/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
> 9.0 (TID 69) in 1680 ms on R1S1 (4/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 
> 9.0 (TID 72) in 1678 ms on R1S1 (5/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 
> 9.0 (TID 67) in 1682 ms on R1S1 (6/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 
> 9.0 (TID 75) in 1710 ms on R1S1 (7/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 
> 9.0 (TID 74) in 1729 ms on R1S1 (8/10)
> 16/04/18 03:16:55 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 
> 9.0 (TID 68) in 1838 ms on R1S1 (9/10)
> 16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 9.0 
> (TID 71, R1S1): java.lang.IllegalArgumentException: requirement failed: 
> sizeInBytes was negative: -1
>         at scala.Predef$.require(Predef.scala:233)
>         at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
>         at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
>         at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
>         at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
>         at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 16/04/18 03:17:25 INFO scheduler.TaskSetManager: Starting task 6.1 in stage 
> 9.0 (TID 76, R1S1, partition 6,PROCESS_LOCAL, 2171 bytes)
> 16/04/18 03:17:25 DEBUG hdfs.DFSClient: DataStreamer block 
> BP-1194875811-10.3.1.3-1460617951862:blk_1073742842_2018 sending packet 
> packet seqno:-1 offsetInBlock:0 lastPacketInBlock:false 
> lastByteOffsetInBlock: 0
> 16/04/18 03:17:25 DEBUG hdfs.DFSClient: DFSClient seqno: -1 status: SUCCESS 
> status: SUCCESS downstreamAckTimeNanos: 653735
> 16/04/18 03:17:25 WARN scheduler.TaskSetManager: Lost task 6.1 in stage 9.0 
> (TID 76, R1S1): org.apache.spark.storage.BlockException: Block manager failed 
> to return cached value for rdd_28_6!
>         at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)
>         at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> We use the following script to submit the job:
> /Hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class 
> org.apache.spark.examples.graphx.Analytics 
> /Hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar 
> pagerank /data/soc-LiveJournal1.txt --output=/output/live-off.res 
> --numEPart=10 --numIter=1 --edgeStorageLevel=OFF_HEAP 
> --vertexStorageLevel=OFF_HEAP
> When we set the storage level to MEMORY_ONLY or DISK_ONLY, there is no error 
> and the job can finished correctly.
> But when we set the storage level to OFF_HEAP, which means using Tachyon for 
> the storage process, the error occurs.
> The executors stack is like this, seems the write block to Tahcyon failed.
> 116/04/18 03:16:54 INFO : Connecting local worker @ R1S1/10.3.1.1:29998
> 16/04/18 03:16:55 INFO : Connecting local worker @ R1S1/10.3.1.1:29998
> 16/04/18 03:16:55 INFO : Folder 
> /mnt/ramdisk/tachyonworker/5291680305620365611 was created!
> 16/04/18 03:16:55 INFO : LocalBlockOutStream created new file block, block 
> path: /mnt/ramdisk/tachyonworker/5291680305620365611/4076863488
> 16/04/18 03:16:55 INFO Executor: Finished task 6.0 in stage 6.0 (TID 59). 
> 2973 bytes result sent to driver
> 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in putValues(rdd_28_6)
> java.io.IOException: Fail to cache: null
>       at 
> tachyon.client.file.FileOutStream.handleCacheWriteException(FileOutStream.java:276)
>       at tachyon.client.file.FileOutStream.close(FileOutStream.java:165)
>       at 
> org.apache.spark.storage.TachyonBlockManager.putValues(TachyonBlockManager.scala:126)
>       at 
> org.apache.spark.storage.ExternalBlockStore.putIntoExternalBlockStore(ExternalBlockStore.scala:79)
>       at 
> org.apache.spark.storage.ExternalBlockStore.putIterator(ExternalBlockStore.scala:67)
>       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:798)
>       at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
>       at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
>       at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
>       at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
>       at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:820)
>       at 
> tachyon.client.block.LocalBlockOutStream.flush(LocalBlockOutStream.java:108)
>       at 
> tachyon.client.block.LocalBlockOutStream.close(LocalBlockOutStream.java:92)
>       at tachyon.client.file.FileOutStream.close(FileOutStream.java:160)
>       ... 18 more
> 16/04/18 03:17:25 INFO : Connecting local worker @ R1S1/10.3.1.1:29998
> 16/04/18 03:17:25 WARN : Worker Client last execution took 29146 ms. Longer 
> than the interval 1000
> 16/04/18 03:17:25 ERROR Executor: Exception in task 6.0 in stage 9.0 (TID 71)
> java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
> negative: -1
>       at scala.Predef$.require(Predef.scala:233)
>       at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
>       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:822)
>       at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645)
>       at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
>       at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> 16/04/18 03:17:25 INFO CoarseGrainedExecutorBackend: Got assigned task 76
> 16/04/18 03:17:25 INFO Executor: Running task 6.1 in stage 9.0 (TID 76)
> 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6)
> java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 
> BlockIndex 0 is not a valid block.
>       at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447)
>       at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126)
>       at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212)
>       at 
> org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152)
>       at 
> org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
>       at 
> org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
>       at scala.Option.flatMap(Option.scala:170)
>       at 
> org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147)
>       at 
> org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486)
>       at 
> org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420)
>       at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625)
>       at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 
> is not a valid block.
>       at 
> tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151)
>       at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444)
>       ... 22 more
> 16/04/18 03:17:25 INFO CacheManager: Partition rdd_28_6 not found, computing 
> it
> 16/04/18 03:17:25 INFO BlockManager: Found block rdd_18_6 locally
> 16/04/18 03:17:25 WARN BlockManager: Block rdd_28_6 already exists on this 
> machine; not re-adding it
> 16/04/18 03:17:25 ERROR ExternalBlockStore: Error in getValues(rdd_28_6)
> java.io.IOException: tachyon.exception.TachyonException: FileId 4093640703 
> BlockIndex 0 is not a valid block.
>       at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:447)
>       at tachyon.client.TachyonFile.getClientBlockInfo(TachyonFile.java:126)
>       at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:212)
>       at 
> org.apache.spark.storage.TachyonBlockManager.getValues(TachyonBlockManager.scala:152)
>       at 
> org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
>       at 
> org.apache.spark.storage.ExternalBlockStore$$anonfun$getValues$1.apply(ExternalBlockStore.scala:147)
>       at scala.Option.flatMap(Option.scala:170)
>       at 
> org.apache.spark.storage.ExternalBlockStore.getValues(ExternalBlockStore.scala:147)
>       at 
> org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:486)
>       at 
> org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:420)
>       at org.apache.spark.storage.BlockManager.get(BlockManager.scala:625)
>       at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154)
>       at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: tachyon.exception.TachyonException: FileId 4093640703 BlockIndex 0 
> is not a valid block.
>       at 
> tachyon.client.FileSystemMasterClient.getFileBlockInfo(FileSystemMasterClient.java:151)
>       at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:444)
>       ... 23 more
> 16/04/18 03:17:25 INFO CacheManager: Failure to store rdd_28_6
> 16/04/18 03:17:25 ERROR Executor: Exception in task 6.1 in stage 9.0 (TID 76)
> org.apache.spark.storage.BlockException: Block manager failed to return 
> cached value for rdd_28_6!
>       at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)
>       at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> And when we check the Storeag page of job UI, we find some blocks are not 
> cached:
> RDD Name      Storage Level   Cached Partitions       Fraction Cached Size in 
> Memory  Size in ExternalBlockStore      Size on Disk
> EdgeRDD       ExternalBlockStore Serialized 1x Replicated     18      90%     
> 0.0 B   753.2 MB        0.0 B
> VertexRDD     ExternalBlockStore Serialized 1x Replicated     20      100%    
> 0.0 B   191.1 MB        0.0 B
> VertexRDD, VertexRDD  ExternalBlockStore Serialized 1x Replicated     20      
> 100%    0.0 B   121.1 MB        0.0 B
> VertexRDD     ExternalBlockStore Serialized 1x Replicated     20      100%    
> 0.0 B   121.3 MB        0.0 B
> GraphLoader.edgeListFile - edges (/data/soc-LiveJournal1.txt)         
> ExternalBlockStore Serialized 1x Replicated     20      100%    0.0 B   871.8 
> MB        0.0 B
> EdgeRDD       ExternalBlockStore Serialized 1x Replicated     18      90%     
> 0.0 B   1307.6 MB       0.0 B
> I have check the onfiguration many times. Can anyone give me some ideas about 
> this issue. I has puzzled me for 2 weeks.
>  Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to