My latest problem is ::
I can not always rely on writing HashMap to file like this::
FileOutputStream fout = new FileOutputStream(f);
ObjectOutputStream objStream = new ObjectOutputStream(fout);
objStream.writeObject(HashMap);
This writing I am doing in the same run() of the outer class. The file can
be very big ...so can I write in such a manner that file is distributed and
I can read it easily in the next MapReduce Phase. Other way, can I split the
file when it becomes gerater than a certain size?
Thanks,
Aayush
On Thu, Apr 17, 2008 at 1:01 PM, Aayush Garg <[EMAIL PROTECTED]> wrote:
> One more thing:::
> The HashMap that I am generating in the reduce phase will be on single
> node or multiple nodes in the distributed enviornment? If my dataset is
> large will this approach work? If not what can I do for this?
> Also same thing with the file that I am writing in the run function
> (simple file opening FileStream) ??
>
>
>
>
> On Thu, Apr 17, 2008 at 6:04 AM, Amar Kamat <[EMAIL PROTECTED]> wrote:
>
> > Ted Dunning wrote:
> >
> > > The easiest solution is to not worry too much about running an extra
> > > MR
> > > step.
> > >
> > > So,
> > >
> > > - run a first pass to get the counts. Use word count as the pattern.
> > > Store
> > > the results in a file.
> > >
> > > - run the second pass. You can now read the hash-table from the file
> > > you
> > > stored in pass 1.
> > >
> > > Another approach is to do the counting in your maps as specified and
> > > then
> > > before exiting, you can emit special records for each key to suppress.
> > > With
> > > the correct sort and partition functions, you can make these killer
> > > records
> > > appear first in the reduce input. Then, if your reducer sees the kill
> > > flag
> > > in the front of the values, it can avoid processing any extra data.
> > >
> > >
> > >
> > Ted,
> > Will this work for the case where the cutoff frequency/count requires a
> > global picture? I guess not.
> >
> > In general, it is better to not try to communicate between map and
> > > reduce
> > > except via the expected mechanisms.
> > >
> > >
> > > On 4/16/08 1:33 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > >
> > > > We can not read HashMap in the configure method of the reducer
> > > > because it is
> > > > called before reduce job.
> > > > I need to eliminate rows from the HashMap when all the keys are
> > > > read.
> > > > Also my concern is if dataset is large will this HashMap thing
> > > > work??
> > > >
> > > >
> > > > On Wed, Apr 16, 2008 at 10:07 PM, Ted Dunning <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > >
> > > >
> > > > > That design is fine.
> > > > >
> > > > > You should read your map in the configure method of the reducer.
> > > > >
> > > > > There is a MapFile format supported by Hadoop, but they tend to be
> > > > > pretty
> > > > > slow. I usually find it better to just load my hash table by
> > > > > hand. If
> > > > > you
> > > > > do this, you should use whatever format you like.
> > > > >
> > > > >
> > > > > On 4/16/08 12:41 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > HI,
> > > > > >
> > > > > > The current structure of my program is::
> > > > > > Upper class{
> > > > > > class Reduce{
> > > > > > reduce function(K1,V1,K2,V2){
> > > > > > // I count the frequency for each key
> > > > > > // Add output in HashMap(Key,value) instead of
> > > > > > output.collect()
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > void run()
> > > > > > {
> > > > > > runjob();
> > > > > > // Now eliminate top frequency keys in HashMap built in
> > > > > > reduce
> > > > > >
> > > > > >
> > > > > function
> > > > >
> > > > >
> > > > > > here because only now hashmap is complete.
> > > > > > // Write this hashmap to a file in such a format so that I
> > > > > > can use
> > > > > >
> > > > > >
> > > > > this
> > > > >
> > > > >
> > > > > > hashmap in next MapReduce job and key of this hashmap is taken
> > > > > > as key in
> > > > > > mapper function of that Map Reduce. ?? How and which format
> > > > > > should I
> > > > > > choose??? Is this design and approach ok?
> > > > > >
> > > > > > }
> > > > > >
> > > > > > public static void main() {}
> > > > > > }
> > > > > > I hope you have got my question.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat <
> > > > > > [EMAIL PROTECTED]>
> > > > > >
> > > > > >
> > > > > wrote:
> > > > >
> > > > >
> > > > > > Aayush Garg wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Are you sure that another MR is required for eliminating
> > > > > > > > some rows?
> > > > > > > > Can't I
> > > > > > > > just somehow eliminate from main() when I know the keys
> > > > > > > > which are
> > > > > > > >
> > > > > > > >
> > > > > > > needed
> > > > >
> > > > >
> > > > > > to
> > > > > > > > remove?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > Can you provide some more details on how exactly are you
> > > > > > > filtering?
> > > > > > > Amar
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>
>
>