Files are stored as blocks and the default block size is 64MB. You can change this by setting the dfs.block.size property. Map/Reduce interprets files in large chunks of bytes and these are called splits. Splits are not physical, think about them as being logical data structures that tell you the starting byte position in the file and the length of the split. Each mapper generally takes a split and precesses it. You can also configure the minimum split size by setting the mapred.min.split.size property. Hope this helps.
-Jim On Tue, Apr 14, 2009 at 1:05 PM, Foss User <[email protected]> wrote: > In the documentation I was reading that files are stored as file > splits in the HDFS. What is the size of each file split? Is it > configurable? If yes, how can I configure it? >
