i think i found the answer: apply(flags: Int, replication: Int): StorageLevel<http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/storage/StorageLevel.html><http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/storage/StorageLevel.html>
2014-03-20 17:00 GMT+08:00 qingyang li <liqingyang1...@gmail.com>: > can someone help me ? > > > 2014-03-12 21:26 GMT+08:00 qingyang li <liqingyang1...@gmail.com>: > > in addition: >> on this site: >> https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#hadoop-datasets >> , >> i find RDD can be stored using a different *storage level on the web, >> and *also find StorageLevel's attribute MEMORY_ONLY_2 . >> MEMORY_ONLY_2, Same as the levels above, but replicate each partition on >> two cluster nodes. >> 1. is this one point of fault-tolerance ? >> 2.if replicate each partition on two cluster nodes will help worker node >> HA ? >> 3. if there is MEMORY_ONLY_3 which could replicate each partition on >> three cluster nodes? >> >> >> >> >> 2014-03-12 12:11 GMT+08:00 qingyang li <liqingyang1...@gmail.com>: >> >> i have one table in memery, when one worker becomes dead, i can not >>> query data from that table. Here is it's storage status: >>> >>> >>> RDD Name Storage LevelCached PartitionsFraction CachedSize in MemorySize >>> on Disk >>> <http://192.168.1.101:4040/storage/rdd?id=47> >>> table01 Memory Deserialized 1x Replicated 119 88% 697.0 MB >>> 0.0 Bso, my question is: >>> 1. what meaning is " Memory Deserialized 1x Replicated" ? >>> 2. how to config worker HA so that i can query data even one worker dead. >>> >> >> >