Thanks Doug for your answers. Our interest is on more distributed file system part rather than map reduce. I must confess that our block size is not as large as how a lot of people configure. I appreciate if I can get your and others' input.
Do you think these numbers are suitable? We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Cagdas At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this means > a ~10,000 node system, larger than Hadoop currently supports well (for this > and other reasons). > > Doug > > -- ------------ Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info