Hey Viva, If you're just getting started with HDFS, I recommend not just thinking about this as seek time vs. transfer time when deciding what size to set the default block. Although Tom White makes a great point about why the block size is generally large, there are other factors to consider as well. Tom is basically saying that if you set the block size to 100 MB, then it'll take at least a second to read the block from disk and then you can do some MapReduce processing on it. If you instead set the block size to 10 MB, then it would take 10ms to do the disk seek and 100ms to do the read of the 10MB off disk. So, now 10% of your disk workload is wasted doing disk head seeks.
Anyway, there are some other factors to consider. The block size will help determine the # of Map tasks that get launched to process your data. For example, say you want to do MapReduce analysis on a 10TB file in HDFS. If the file's block size is 128MB, you will have 81,920 unique blocks making up that file: (10 terabytes) / (128 megabytes) = 81,920 With default replication of 3, you now have 245,760 blocks across the cluster comprising that file in HDFS: 81,920 * 3 = 245,760 Since there are 81,920 unique blocks that make that file, the MapReduce framework by default will launch 81,920 Map tasks to process that file (you can influence MapReduce to use more or less maps by setting setNumMapTasks(int)). If you make your block size 256MB, then only 40,960 Map tasks would be launched to process the file. With a 1GB block size, only 10,240 map tasks would launch. If only 10,240 map tasks launch and each map task as to read 1GB at 100/MBps, it would take like 10 seconds for each map task to read it's 1GB chunk. So, the point is that your block size can affect how fast/slow your MapReduce jobs will run. If there are a lot of small blocks (128MB), the MapReduce job will probably run faster than if there are a lot of larger blocks (1GB). Now you typically want around 10 -100 maps per-node. Also, setting up the Java Virtual Machine for the map takes a while so it's best if the maps take at least a minute to execute. Also, on a side note, a HDFS block doesn't behave like a linux ext3 block all the time. If the HDFS block size is 128MB, but the file you want to write to HDFS is only 25MB, then that specific block will only take up 25MB on the disk. So, not every block is exactly 128MB, some might be smaller. Finally, the block size and replication factor are configurable per file, but you should set a good default for both based on your custom environment and use case. -- Sameer Farooqui Systems Architect / Hortonworks On Thu, Feb 23, 2012 at 6:43 AM, viva v <vivamail...@gmail.com> wrote: > Thanks very much for the clarification. > > So, we'd i guess ideally set the block size equal to the transfer rate for > optimum results. > > If seek time has to be 0.5% of transfer time would i set my block size at > 200MB (higher than transfer rate)? > Conversely if seek time has to be 2% of transfer time would i still set my > block size at 100MB? > > On Wed, Feb 22, 2012 at 8:16 PM, Praveen Sripati <praveensrip...@gmail.com > > wrote: > >> Seek time is ~ 10ms. If seek time has to be 1% of the transfer time then >> transfer time has to be ~ 1000 ms (1s). >> >> In ~ 1000 ms (1s) with a transfer rate of 100 MB/s, a block of 100MB can >> be read. >> >> Praveen >> >> On Wed, Feb 22, 2012 at 11:22 AM, viva v <vivamail...@gmail.com> wrote: >> >>> Have just started getting familiar with Hadoop & HDFS. Reading Tom >>> White's book. >>> >>> The book describes an example related to HDFS block size. Here's a >>> verbatim excerpt from the book >>> >>> "If the seek time is around 10 ms, and the transfer rate is 100 MB/s, >>> then to make the seek time 1% of the transfer time, we need to make the >>> block size around 100 MB." >>> >>> I can't seem to understand how we arrived at the fact that block size >>> shold be 100MB. >>> >>> Could someone please help me understand? >>> >>> Thanks >>> Viva >>> >> >> >