Exactly Matthew, The weird thought was in that direction. Basically i do have a tilde separated input which has to undergo some aggregation operation. So I was just giving a shot to see if there is a possibility to run directly into Sort Shuffle phase directly and then the reducer without a mapper. I know I need to need at least depend on IdentityMapper. A small query on top of this. If we take a basic map reduce job, say word count without a combiner. What would the percentage distribution of execution time on map, reduce and the sort shuffle phase?
On Wed, Sep 7, 2011 at 10:30 PM, GOEKE, MATTHEW (AG/1000) < matthew.go...@monsanto.com> wrote: > Bejoy,**** > > ** ** > > What exactly is your use case? I know down below you said you were just > thinking of a weird design but it would really help if we knew exactly what > you were shooting for because we might be able to refactor it.**** > > ** ** > > I have a job that I developed that still required the input to be sorted > for the reduce but I did not need to do any transformation or filtering in > the map side so I just did an identity mapper, as Robert mentions below > this, and it works perfectly. I do not think that there is any way to pass > data directly into the S/S phase without going through the map phase (if > that is what you were hinting at) and if you don’t require the data to go > through S/S then you can make it a map only job.**** > > ** ** > > Matt**** > > ** ** > > *From:* Robert Hafner [mailto:ted...@tedivm.com] > *Sent:* Wednesday, September 07, 2011 11:34 AM > > *To:* mapreduce-user@hadoop.apache.org > *Subject:* Re: No Mapper but Reducer**** > > ** ** > > ** ** > > You could just have a mapper which sent off the exact values it took in > (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do > here. > > **** > > > On Sep 7, 2011, at 4:21 AM, Bejoy KS <bejoy.had...@gmail.com> wrote:**** > > Thank You All. Even I have noticed this strange behavior some time back. > Now my inital concern still remains. If I provide my input directory an > empty one, yes the map tasks wont be executed .But my reducer needs input > to do the processing/ aggregation. In such a scenario, is there an option to > provide input just to the reducer? > > Regards > Bejoy.K.S**** > > On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <sudha...@gmail.com> > wrote:**** > > This is true and it took as off by surprise in recent past. Also, it had > quite some impact on our job cycles where the size of input is totally > random and could also be zero at times. **** > > ** ** > > In one of our cycles, we run a lot of jobs. Say we configure X as the num > of reducers for a job which does not have any input.**** > > ** ** > > Y -> No of tasktrackers in the cluster**** > > ** ** > > H -> Time Interval for Heartbeat response**** > > ** ** > > With the cdh2 version, the job takes, **** > > ** ** > > ( X / Y) * H seconds to complete without doing any work since we assign > only one reduce task per heartbeat**** > > ** ** > > ** ** > > If the number of such jobs in the cycle is more, then the total time that > the cluster spends doing nothing accumulates.**** > > ** ** > > I was thinking of raising this as a jira but not sure. Should we raise and > fix this as jira request? Num of reducers set by the client can be overriden > if the number of mappers is 0?**** > > ** ** > > We have a way to hack, by verifying the existence of the input path to the > Map phase ourselves but just thought would be more intuitive for the > framework to handle itself**** > > ** ** > > -Sudhan S**** > > ** ** > > On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:**** > > Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a > job ;-) > > /me puts his troll-mask on. > > ➜ ~HADOOP_HOME hadoop fs -mkdir abc > ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount > abc out > 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process > : 0 > 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001 > 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% > 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% > 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: > job_201109071413_0001 > 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 > 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters > 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 > 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 > 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all > maps waiting after reserving slots (ms)=0 > 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 > 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters > 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 > 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 > > /me takes off troll mask.**** > > > On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <bejoy.had...@gmail.com> wrote: > > Thanks Sonal. I was just thinking of some weird design and wanted to make > > sure whether there is a possibility like that- no maps and all reducers. > > > > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <sonalgoy...@gmail.com> > wrote: > >> > >> I dont think that is possible, can you explain in what scenario you want > >> to have no mappers, only reducers? > >> Best Regards, > >> Sonal > >> Crux: Reporting for HBase > >> Nube Technologies > >> > >> > >> > >> > >> > >> > >> > >> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <bejoy.had...@gmail.com> > wrote: > >>> > >>> Hi > >>> I'm having a query here. Is it possible to have no mappers > but > >>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers > we can > >>> set numReduceTasks to zero but such a setting on mapper wont work. So > how > >>> can it be achieved if possible? > >>> > >>> Thank You > >>> > >>> Regards > >>> Bejoy.K.S > >> > > > > > > > **** > > -- > Harsh J**** > > ** ** > > ** ** > > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use > of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. >