Just a followup on this Thread . I tried Hierarchical Storage on Tachyon ( http://tachyon-project.org/Hierarchy-Storage-on-Tachyon.html ) , and that seems to have worked and I did not see any any Spark Job failed due to BlockNotFoundException. below is my Hierarchical Storage settings..
-Dtachyon.worker.hierarchystore.level.max=2 -Dtachyon.worker.hierarchystore.level0.alias=MEM -Dtachyon.worker.hierarchystore.level0.dirs.path=$TACHYON_RAM_FOLDER -Dtachyon.worker.hierarchystore.level0.dirs.quota=$TACHYON_WORKER_MEMORY_SIZE -Dtachyon.worker.hierarchystore.level1.alias=HDD -Dtachyon.worker.hierarchystore.level1.dirs.path=/mnt/tachyon -Dtachyon.worker.hierarchystore.level1.dirs.quota=50GB -Dtachyon.worker.allocate.strategy=MAX_FREE -Dtachyon.worker.evict.strategy=LRU Regards, Dibyendu On Thu, May 7, 2015 at 1:46 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Dear All , > > I have been playing with Spark Streaming on Tachyon as the OFF_HEAP block > store . Primary reason for evaluating Tachyon is to find if Tachyon can > solve the Spark BlockNotFoundException . > > In traditional MEMORY_ONLY StorageLevel, when blocks are evicted , jobs > failed due to block not found exception and storing blocks in > MEMORY_AND_DISK is not a good option either as it impact the throughput a > lot . > > > To test how Tachyon behave , I took the latest spark 1.4 from master , and > used Tachyon 0.6.4 and configured Tachyon in Fault Tolerant Mode . Tachyon > is running in 3 Node AWS x-large cluster and Spark is running in 3 node AWS > x-large cluster. > > I have used the low level Receiver based Kafka consumer ( > https://github.com/dibbhatt/kafka-spark-consumer) which I have written > to pull from Kafka and write Blocks to Tachyon > > > I found there is similar improvement in throughput (as MEMORY_ONLY case ) > but very good overall memory utilization (as it is off heap store) . > > > But I found one issue on which I need to clarification . > > > In Tachyon case also , I find BlockNotFoundException , but due to a > different reason . What I see TachyonBlockManager.scala put the blocks in > WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted > from Tachyon Cache and when Spark try to find the block it throws > BlockNotFoundException . > > I see a pull request which discuss the same .. > > https://github.com/apache/spark/pull/158#discussion_r11195271 > > > When I modified the WriteType to CACHE_THROUGH , BlockDropException is > gone , but it again impact the throughput .. > > > Just curious to know , if Tachyon has any settings which can solve the > Block Eviction from Cache to Disk, other than explicitly setting > CACHE_THROUGH ? > > Regards, > Dibyendu > > >