1) Map and Reduce model is a file-based communication. So, each mappers can run separately. For example, To run MR job on 1 GB input data, 5 mappers will be scheduled. Even though there are only 2 task slots (single machine), MR job slow but works - 2 running Map Tasks, 3 pending Map tasks.
However, unlike MapReduce, BSP uses network-based communication. It means that the every BSP tasks must run at once. And the number of BSP tasks is determined by the number of blocks of input. So, you CANNOT run 1 GB input data on a single machine. It's not a Memory issue. > throw OOM exceptions, instead it may eventually process items slower (with > caches / queues) but never throw an exception for that but that's just my I hope so too, but I think you are saying about Iterative MapReduce. 2) The normal block size of HDFS is 64 ~ 256 MB. If we can assume that the split size = block size, I feel that current system is enough. I don't think we have to spend a time for implementing disk-based something. WDYT? On Tue, Feb 25, 2014 at 12:19 AM, Anastasis Andronidis <[email protected]> wrote: > On 24 Φεβ 2014, at 3:32 μ.μ., Tommaso Teofili <[email protected]> > wrote: > >>> >>> According to my personal evaluations, current system is fairly >>> respectable. As I mentioned before, I believe we should stick to >>> in-memory style since the today's machines can be equipped with up to >>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a >>> must-have. >>> >> >> right, the only thing that I think we need to address before 0.7.0 is >> related to the OutOfMemory errors (especially when dealing with large >> graphs); for example IMHO even if the memory is not enough to store all the >> graph vertices assigned to a certain peer, a scalable system should never >> throw OOM exceptions, instead it may eventually process items slower (with >> caches / queues) but never throw an exception for that but that's just my >> opinion. >> > > I like and agree with this. > > Cheers, > Anastasis > -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
