Re: how to write this MapReduce

Thomas Thevis Mon, 26 Oct 2009 09:39:38 -0700

Hey Anty,

there exists a config key 'map.input.file' which should return the nameof the input file the mapper gets its input values from.In the pre-hadoop-0.20.0 era, one would have to implement theconfigure() method to have access to the configuration. Since then, itcould be possible to use the configuration from the context object.However, if your input files aren't sorted in any way, this approachwon't work.


Best Regards
Thomas


Anty schrieb:

Thanks very much for your reply Thomas.

I search in Mapper.map() method,but i still can't find out the way toretrieve the source file name of the input data,can you describe in moredetails?

for your proposed suggestion,i have some doubts,

the names of the three files are random,so we couldn't sort the valuesby file name,which will not correspond to the order of(value1A,value1B,value1C),e.g

"bbbb"                  "aaaa"                   "ccccc"

key1-value1A      key1-value1B     key1-value1C

then if we sort the value by file name,the result will be"key1-(value1B,value1A,

value1C)" or "key1-(value1C,value1A,value1B)"
Maybe i should use some particular rules to sort the values.
Thanks Thomas.

On Mon, Oct 26, 2009 at 11:36 PM, Anty <[email protected]<mailto:[email protected]>> wrote:


    Thanks very much for your reply Thomas.
    I search in Mapper.map() method,but i still can't find out the way
    to retrieve the source file name of the input data,can you describe
    in more details?
    for your proposed suggestion,i have some doubts,
    the names of the three files are  random,so we couldn't sort the
    values by file name,which will not correspond  to the order of
    (value1A,value1B,value1C),e.g
    "bbbb"                  "aaaa"                   "ccccc"

    key1-value1A      key1-value1B     key1-value1C

    then if we sort the value by file name,the result will be
    "key1-(value1B,value1A,value1C)" or "key1-(value1C,value1A,value1B)"
    Maybe i should use some particular rules to sort the values.
    Thanks Thomas.


    Up to now i don't know how to retrieve the source file name of the
    input data within Mapper.map() method,.Anyway,i have some doubts
    about your proposed suggestion.


    On Mon, Oct 26, 2009 at 8:59 PM, Thomas Thevis
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Anty,

        as far as I know, it is possible to retrieve the source file
        name of the input data within the Mapper's map() method.
        If so, you could use secondary sort on values (have a look at
        the Hadoop wiki pages) to propagate the values sorted first by
        key and second by filename to the Reducer which could aggregate
        them in any particukar way.

        Hope that helps
        Thomas


        Anty schrieb:

            Does MultipleInputs meet this situation?
            Does any one have any idea about this?

            On Mon, Oct 26, 2009 at 7:44 PM, Anty <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>> wrote:

               Hi:
               all
               I have a such use case:i have three files,each file is
            key-value pairs,

file1: file2:file3:

               key1-value1A           key1-value1B           key1-value1C
               key2-value2A           key2-value2B           key2-value2C
               key3-value3A           kye3-value3B           kye3-value3C

..... ...........

               now ,i want to write a MR job to generate a file,
               file4:
               key1-(value1A,value1B,value1C)
               key2-(value2A,value2B,value2C)
               key3-(value3A,value3B,value3C)
               ..........
               Any suggestion will be appreciated.
               --    Best Regards
               Anty Rao

--Best Regards

            Anty Rao

--Best Regards

    Anty Rao




--
Best Regards
Anty Rao



--
Thomas Thevis
Software Developer
------------------------------------------------------------
vionto GmbH
Karl-Marx-Allee 90a, D-10243 Berlin

fon   +49 30 40 20 3 29 - 28
fax   +49 30 40 20 3 29 - 29
web   http://www.vionto.com
------------------------------------------------------------
Geschäftsführer: Ralf von Grafenstein, Dr. Martin C. Hirsch
Sitz der Gesellschaft: Berlin
Amtsgericht Berlin Charlottenburg, HRB 108054B
------------------------------------------------------------

Re: how to write this MapReduce

Reply via email to