fault tolerance. As files are uploaded into our server, we can continuously write the data in small chunks and if our server fails, we can tolerate this failure by switching our user to another server and the user can continue to write. Otherwise we have to wait on the server until we get the whole file to write it to Hadoop (if server fails then we lose all the data), or we need the user to cash all the data he is generating which is not feasible for our requirements.
I appreciate your comment on this. Cagdas On Fri, May 2, 2008 at 1:09 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > Why did you pick such a small block size? > > Why not go with the default of 64MB? > > That would give you only 10 million blocks for your 600TB. > > I don't see any advantage to the tiny block size. > > On 5/2/08 1:06 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote: > > > Thanks Doug for your answers. Our interest is on more distributed file > > system part rather than map reduce. > > I must confess that our block size is not as large as how a lot of > people > > configure. I appreciate if I can get your and others' input. > > > > Do you think these numbers are suitable? > > > > We will have 5 million files each having 20 blocks of 2MB. With the > minimum > > replication of 3, we would have 300 million blocks. > > 300 million blocks would store 600TB. At ~10TB/node, this means a 60 > node > > system. > > > > Do you think these numbers are suitable for Hadoop DFS. > > > > Cagdas > > > > > > > > At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this > means > >> a ~10,000 node system, larger than Hadoop currently supports well (for > this > >> and other reasons). > >> > >> Doug > >> > >> > > > > -- ------------ Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info