Re: Store mapreduce output into my own data structures

Jeff Zhang Fri, 27 Nov 2009 22:49:11 -0800

Or you can put the further processing in another Map-Reduce Job, make the
whole a map-reduce jobs chain.



Jeff Zhang


On Fri, Nov 27, 2009 at 10:38 PM, Jeff Zhang <[email protected]> wrote:

>
> Hi Liu,
>
> The reducer task is in an individual JVM, you have to put your modules into
> reducer task if you really want to access the output in memory.
>
> I am not sure the size of your output, if it's not large, I suggest put
> them in a message, and wrap your modules into a listener, and then send the
> message to this listener for further processing.
>
> If the size of your output is large, I suggest you store them in hdfs, and
> put the location in a message and send the message to the listener.
> Because you said your modules are complicated, so I suggest you separate
> them with the map-reduce jobs as I mentioned above , it will increase the
> maintainability and extensibility of your system.
>
>
> Jeff Zhang
>
>
>
>
> On Fri, Nov 27, 2009 at 9:45 PM, Liu Xianglong <[email protected]>wrote:
>
>> Hi, Jeff. Thanks for you reply. Actually, I will do further process of the
>> map-reduce output. If I cannot store them in memory, other modules cannot
>> process them. So if these modules are integrated into map-reduce, then they
>> will finish the process in mapreduce jobs. The problem is that these modules
>> are complicated. The easy way is to store output of jobs in memory. What do
>> you think? Do you have such experiences?
>>
>>
>> --------------------------------------------------
>> From: "Jeff Zhang" <[email protected]>
>> Sent: Friday, November 27, 2009 10:46 PM
>>
>> To: <[email protected]>
>> Subject: Re: Store mapreduce output into my own data structures
>>
>>  So how do you plan to integrate your other modules with hadoop ?
>>>
>>> Put them in reduce phase ?
>>>
>>>
>>> Jeff Zhang
>>>
>>>
>>>
>>> On Fri, Nov 27, 2009 at 6:37 AM, <[email protected]> wrote:
>>>
>>>  Actually I want the output can be used by other modules. So it has to
>>>> read
>>>> the output from hdfs files? Or integrate these modules into map-reduce?
>>>> Is
>>>> there other ways?
>>>>
>>>> --------------------------------------------------
>>>> From: "Jeff Zhang" <[email protected]>
>>>> Sent: Friday, November 27, 2009 10:00 PM
>>>> To: <[email protected]>
>>>> Subject: Re: Store mapreduce output into my own data structures
>>>>
>>>>
>>>>  Hi Liu,
>>>>
>>>>>
>>>>> Why you want to store the output in memory?  You can not use the output
>>>>> out
>>>>> of reducer.
>>>>> Actually at the beginning the output of reducer is in memory, and the
>>>>> OutputFormat write these data to file system or other data store.
>>>>>
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>>
>>>>> 2009/11/27 Liu Xianglong <[email protected]>
>>>>>
>>>>>  Hi, everyone. Is there someone who uses map-reduce to store the reduce
>>>>>
>>>>>> output in memory. I mean, now the output path of job is set and reduce
>>>>>> outputs are stored into files under this path.(see the comments along
>>>>>> with
>>>>>> the following codes)
>>>>>>   job.setOutputFormatClass(MyOutputFormat.class);
>>>>>>   //can I implement my OutputFormat to store these output key-value
>>>>>> pairs
>>>>>> in my data structures, or are these other ways to do it?
>>>>>>   job.setOutputKeyClass(ImmutableBytesWritable.class);
>>>>>>   job.setOutputValueClass(Result.class);
>>>>>>   FileOutputFormat.setOutputPath(job, outputDir);
>>>>>>
>>>>>>  Is there any way to store them in some variables or data structures?
>>>>>> Then
>>>>>> how can I implement my OutputFormat? Any suggestions and codes are
>>>>>> welcomed.
>>>>>>
>>>>>> Another question: is there some way to set the number of map task? It
>>>>>> seems
>>>>>> there is no API to do this in hadoop new job APIs. I am not sure the
>>>>>> way
>>>>>> to
>>>>>> set this number.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Best Wishes!
>>>>>> _____________________________________________________________
>>>>>>
>>>>>> 刘祥龙  Liu Xianglong
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: Store mapreduce output into my own data structures

Reply via email to