An InputSplit is the definition of a Mapper's input and has similar characteristics as a HDFS Block (Offset, Length, Locations). But, an InputSplit is computed by an InputFormat class to suit an input's requirement (such as newline boundaries in Text files, which isn't taken care of while splitting the incoming data into blocks by the HDFS) and can thus span across multiple blocks or be less than one (For example, via minimum split size configurations).
On Fri, Jan 14, 2011 at 6:53 PM, Pedro Costa <psdc1...@gmail.com> wrote: > For example, if the location of a input split is at > /DataCenter1/Rack1/Node1, this means that this is the location of the > namenode, and not the physical location of the data blocks? -- Harsh J www.harshj.com