Re: Map reduce classes

Aayush Garg Thu, 17 Apr 2008 06:31:35 -0700

My latest problem is ::
I can not always rely on writing HashMap to file like this::


FileOutputStream fout = new FileOutputStream(f);
ObjectOutputStream objStream = new ObjectOutputStream(fout);
objStream.writeObject(HashMap);

This writing I am doing in the same run() of the outer class. The file can
be very big ...so can I write in such a manner that file is distributed and
I can read it easily in the next MapReduce Phase. Other way, can I split the
file when it becomes gerater than a certain size?

Thanks,
Aayush


On Thu, Apr 17, 2008 at 1:01 PM, Aayush Garg <[EMAIL PROTECTED]> wrote:

> One more thing:::
> The HashMap that I am generating in the reduce phase will be on single
> node or multiple nodes in the distributed enviornment? If my dataset is
> large will this approach work? If not what can I do for this?
> Also same thing with the file that I am writing in the run function
> (simple file opening FileStream) ??
>
>
>
>
> On Thu, Apr 17, 2008 at 6:04 AM, Amar Kamat <[EMAIL PROTECTED]> wrote:
>
> > Ted Dunning wrote:
> >
> > > The easiest solution is to not worry too much about running an extra
> > > MR
> > > step.
> > >
> > > So,
> > >
> > > - run a first pass to get the counts.  Use word count as the pattern.
> > >  Store
> > > the results in a file.
> > >
> > > - run the second pass.  You can now read the hash-table from the file
> > > you
> > > stored in pass 1.
> > >
> > > Another approach is to do the counting in your maps as specified and
> > > then
> > > before exiting, you can emit special records for each key to suppress.
> > >  With
> > > the correct sort and partition functions, you can make these killer
> > > records
> > > appear first in the reduce input.  Then, if your reducer sees the kill
> > > flag
> > > in the front of the values, it can avoid processing any extra data.
> > >
> > >
> > >
> > Ted,
> > Will this work for the case where the cutoff frequency/count requires a
> > global picture? I guess not.
> >
> >  In general, it is better to not try to communicate between map and
> > > reduce
> > > except via the expected mechanisms.
> > >
> > >
> > > On 4/16/08 1:33 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > >
> > > > We can not read HashMap in the configure method of the reducer
> > > > because it is
> > > > called before reduce job.
> > > > I need to eliminate rows from the HashMap when all the keys are
> > > > read.
> > > > Also my concern is if dataset is large will this HashMap thing
> > > > work??
> > > >
> > > >
> > > > On Wed, Apr 16, 2008 at 10:07 PM, Ted Dunning <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > >
> > > >
> > > > > That design is fine.
> > > > >
> > > > > You should read your map in the configure method of the reducer.
> > > > >
> > > > > There is a MapFile format supported by Hadoop, but they tend to be
> > > > > pretty
> > > > > slow.  I usually find it better to just load my hash table by
> > > > > hand.  If
> > > > > you
> > > > > do this, you should use whatever format you like.
> > > > >
> > > > >
> > > > > On 4/16/08 12:41 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > HI,
> > > > > >
> > > > > > The current structure of my program is::
> > > > > > Upper class{
> > > > > > class Reduce{
> > > > > >  reduce function(K1,V1,K2,V2){
> > > > > >        // I count the frequency for each key
> > > > > >     // Add output in  HashMap(Key,value)  instead  of
> > > > > >  output.collect()
> > > > > >   }
> > > > > >  }
> > > > > >
> > > > > > void run()
> > > > > >  {
> > > > > >      runjob();
> > > > > >     // Now eliminate top frequency keys in HashMap built in
> > > > > > reduce
> > > > > >
> > > > > >
> > > > > function
> > > > >
> > > > >
> > > > > > here because only now hashmap is complete.
> > > > > >     // Write this hashmap to a file in such a format so that I
> > > > > > can use
> > > > > >
> > > > > >
> > > > > this
> > > > >
> > > > >
> > > > > > hashmap in next MapReduce job and key of this hashmap is taken
> > > > > > as key in
> > > > > > mapper function of that Map Reduce. ?? How and which format
> > > > > > should I
> > > > > > choose??? Is this design and approach ok?
> > > > > >
> > > > > >  }
> > > > > >
> > > > > >  public static void main() {}
> > > > > > }
> > > > > > I hope you have got my question.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat <
> > > > > > [EMAIL PROTECTED]>
> > > > > >
> > > > > >
> > > > > wrote:
> > > > >
> > > > >
> > > > > > Aayush Garg wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Are you sure that another MR is required for eliminating
> > > > > > > > some rows?
> > > > > > > > Can't I
> > > > > > > > just somehow eliminate from main() when I know the keys
> > > > > > > > which are
> > > > > > > >
> > > > > > > >
> > > > > > > needed
> > > > >
> > > > >
> > > > > > to
> > > > > > > > remove?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > Can you provide some more details on how exactly are you
> > > > > > > filtering?
> > > > > > > Amar
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>
>
>

Re: Map reduce classes

Reply via email to