@Thomas Thanks.My input files are sorted . @Jingkei Thanks.I will have a look at the instructions for join.
On Tue, Oct 27, 2009 at 12:39 AM, Thomas Thevis <[email protected]>wrote: > Hey Anty, > > there exists a config key 'map.input.file' which should return the name of > the input file the mapper gets its input values from. > In the pre-hadoop-0.20.0 era, one would have to implement the configure() > method to have access to the configuration. Since then, it could be possible > to use the configuration from the context object. > However, if your input files aren't sorted in any way, this approach won't > work. > > Best Regards > Thomas > > > Anty schrieb: > >> Thanks very much for your reply Thomas. >> I search in Mapper.map() method,but i still can't find out the way to >> retrieve the source file name of the input data,can you describe in more >> details? >> for your proposed suggestion,i have some doubts, >> the names of the three files are random,so we couldn't sort the values by >> file name,which will not correspond to the order of >> (value1A,value1B,value1C),e.g >> "bbbb" "aaaa" "ccccc" >> >> key1-value1A key1-value1B key1-value1C >> >> then if we sort the value by file name,the result will be >> "key1-(value1B,value1A, >> value1C)" or "key1-(value1C,value1A,value1B)" >> Maybe i should use some particular rules to sort the values. >> Thanks Thomas. >> >> >> On Mon, Oct 26, 2009 at 11:36 PM, Anty <[email protected] <mailto: >> [email protected]>> wrote: >> >> Thanks very much for your reply Thomas. >> I search in Mapper.map() method,but i still can't find out the way >> to retrieve the source file name of the input data,can you describe >> in more details? >> for your proposed suggestion,i have some doubts, >> the names of the three files are random,so we couldn't sort the >> values by file name,which will not correspond to the order of >> (value1A,value1B,value1C),e.g >> "bbbb" "aaaa" "ccccc" >> >> key1-value1A key1-value1B key1-value1C >> >> then if we sort the value by file name,the result will be >> "key1-(value1B,value1A,value1C)" or "key1-(value1C,value1A,value1B)" >> Maybe i should use some particular rules to sort the values. >> Thanks Thomas. >> >> >> Up to now i don't know how to retrieve the source file name of the >> input data within Mapper.map() method,.Anyway,i have some doubts >> about your proposed suggestion. >> >> >> On Mon, Oct 26, 2009 at 8:59 PM, Thomas Thevis >> <[email protected] <mailto:[email protected]>> wrote: >> >> Hi Anty, >> >> as far as I know, it is possible to retrieve the source file >> name of the input data within the Mapper's map() method. >> If so, you could use secondary sort on values (have a look at >> the Hadoop wiki pages) to propagate the values sorted first by >> key and second by filename to the Reducer which could aggregate >> them in any particukar way. >> >> Hope that helps >> Thomas >> >> >> Anty schrieb: >> >> Does MultipleInputs meet this situation? >> Does any one have any idea about this? >> >> On Mon, Oct 26, 2009 at 7:44 PM, Anty <[email protected] >> <mailto:[email protected]> <mailto:[email protected] >> >> <mailto:[email protected]>>> wrote: >> >> Hi: >> all >> I have a such use case:i have three files,each file is >> key-value pairs, >> file1: file2: >> file3: >> key1-value1A key1-value1B key1-value1C >> key2-value2A key2-value2B key2-value2C >> key3-value3A kye3-value3B kye3-value3C >> ..... ...... >> ..... >> now ,i want to write a MR job to generate a file, >> file4: >> key1-(value1A,value1B,value1C) >> key2-(value2A,value2B,value2C) >> key3-(value3A,value3B,value3C) >> .......... >> Any suggestion will be appreciated. >> -- Best Regards >> Anty Rao >> >> >> >> >> -- Best Regards >> Anty Rao >> >> >> >> >> >> -- Best Regards >> Anty Rao >> >> >> >> >> -- >> Best Regards >> Anty Rao >> > > > -- > Thomas Thevis > Software Developer > ------------------------------------------------------------ > vionto GmbH > Karl-Marx-Allee 90a, D-10243 Berlin > > fon +49 30 40 20 3 29 - 28 > fax +49 30 40 20 3 29 - 29 > web http://www.vionto.com > ------------------------------------------------------------ > Geschäftsführer: Ralf von Grafenstein, Dr. Martin C. Hirsch > Sitz der Gesellschaft: Berlin > Amtsgericht Berlin Charlottenburg, HRB 108054B > ------------------------------------------------------------ > -- Best Regards Anty Rao
