A couple of Questions on InputFormat

Steve Lewis Sat, 21 Sep 2013 12:31:08 -0700

Classes implementing InputFormat implement
 public List<InputSplit> getSplits(JobContext job) which a List if
InputSplits. for FileInputFormat the Splits have Path.start and End


1) When is this method called and on which JVM on Which Machine and is it
called only once?

2) Do the number of Map task correspond to the number of splits returned by
getSplits?

3) InputFormat implements a method
 RecordReader<K,V> createRecordReader(InputSplit split,TaskAttemptContext
context ). Is this  executed within the JVM of the Mapper on the slave
machine and does the RecordReader run within that JVM

4) The default RecordReaders read a file from the start position to the end
position emitting values in the order read. With such a reader, assume it
is reading lines of text, is it reasonable to assume that the values the
mapper received are in the same order they were found in a file? Would it,
for example, be possible for WordCount to see a word that was hyphen-
ated at the end of one line and append the first word of the next line it
sees (ignoring the case where the word is at the end of a split)

A couple of Questions on InputFormat

Reply via email to