Florin, Your second example is how it happens in Hadoop, but there's more here to understand.
To start with, your InputFormat (input splitter) computes and publishes a set amount of InputSplits. The total number of input splits is gonna be your total number of 'Map Tasks' in Hadoop as the job proceeds. The input splits are generally block splits, i.e., start-and-stop lengths over the same file. Each 'MapTask' is designated one split from this list of splits. So every map task would initialize separately, in its own JVM (no shared resources -- again, its a different instance of mappers per file or block!) and read the input split alone, into its map(key, value, context) function. So to summarize, your second example is what will happen, but it would be in parallel instead, such as: map1 | map2 | … file1 | file2 | … row1 | row1 | … row2 | row 2 | … P.s. What I've explained here is the default behavior. Of course things can be highly tweaked to achieve other things, like your first example, but those probably come with greater read costs attached. The 'hadoop' way is data local, and one-file-per-task. On Wed, Jul 20, 2011 at 12:11 PM, Florin P <florinp...@yahoo.com> wrote: > Hello! > Suppose that we have the files F1, F2,..Fk given by the input splitter to > the map class, what is the order in which they will arrive when map function > is applied? > What is interesting me if it is possible that in the map function to > arrive mixed key-value pairs from different files? They keys will arrive > related with their file, till no more keys are left from source file or they > can arrive one key from F1 one key from Fk and so on. > Example: > Mixed key value pairs at the map function: > K1 from F1 > K5 from F5 > K7 from F8 > etc > > ordered key-value pairs: > K1 from F1 > .. > K_end_F1 from F1 > K5 from F5 > .. > K_end_F5 from F5 > and so on. > > I'll look forward for your answer. > Regards, > Florin > > -- Harsh J