commit

Amandeep Khurana Mon, 02 Nov 2009 03:21:00 -0800

inline

On Mon, Nov 2, 2009 at 3:15 AM, Zhang Bingjun (Eddy) <[email protected]>wrote:


> Dear Khurana,
>
> We didn't use MapRunnable. In stead, we used directly the package
> org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
> normal Mapper Class to it using its getMapperClass() interface. We set the
> number of threads using its setNumberOfThreads(). Is this one correct way
> of
> doing multithreaded mapper?
>

I was just curious on how you did it. This is the right way afaik


>
> We noticed in hadoop-0.20.1 there is another
> MultithreadedMapper, org.apache.hadoop.mapred.lib.map.MultithreadedMapper,
> but we didn't touch it.
>

Thats the deprecated package. You used the correct one.


>
> It might be the reason that some thread didn't return. We need to do some
> work to confirm that. We will also try to enable DEBUG mode of hadoop.
> Could
> you share some info on starting an hadoop deamon or the whole hadoop
> cluster
> in debug mode?
>

You'll have to edit the log4jproperties file in $HADOOP_HOME/conf/
After editing, you'll have to restart the daemons (or the entire cluster).

The DEBUG logs might give some more info of whats happening.


>
> Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: [email protected], [email protected], [email protected]
> Tel No: +65-96188110 (M)
>
>
> On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <[email protected]
> >wrote:
>
> > Hi all,
> >
> > An important observation. The 100% mapper without completion all have
> > temporary files of 64MB exactly, which means the output of the mapper is
> cut
> > off at the block boundary. However, we do have some successfully
> completed
> > mappers having output files larger than 64MB and we also have less than
> 100%
> > mappers have temporary files larger than 64MB.
> >
> > Here is the info returned by "hadoop fs -ls
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
> > -rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091
> >
> > This is the temporary file of a 100% mapper without completion.
> >
> > Any clues on this?
> >
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: [email protected], [email protected], [email protected]
> > Tel No: +65-96188110 (M)
> >
> >
> > On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <[email protected]>
> wrote:
> >
> >> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <
> [email protected]
> >> >wrote:
> >>
> >> > Hi Pallavi, Khurana, and Vasekar,
> >> >
> >> > Thanks a lot for your reply. To make up, the mapper we are using is
> the
> >> > multithreaded mapper.
> >> >
> >>
> >> How are you doing this? Did you your own MapRunnable?
> >>
> >>
> >
> >> >
> >> > To answer your questions:
> >> >
> >> > Pallavi, Khurana: I have checked the logs. The key it got stuck on is
> >> the
> >> > last key it reads in. Since the progress is 100% I suppose the key is
> >> the
> >> > last key? From the stdout log of our mapper, we are confirmed that the
> >> map
> >> > function of the mapper has completed. After that, no more key was read
> >> in
> >> > and no other progress is made by the mapper, which means it didn't
> >> complete
> >> > / commit being 100%. For each job, we have different number of mapper
> >> got
> >> > stuck. But it is roughly about one third to half mappers. From the
> >> stdout
> >> > logs of our mapper, we are also confirmed that the map function of the
> >> > mapper has finished. That's why we started to suspect the MapReduce
> >> > framework has something to do with the stuck problem.
> >> >
> >> > Here is log from the stdout:
> >> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
> >> > Disco</artist></track>
> >> > [0] [293419] start creating objects
> >> > [1] [293419] start parsing xml
> >> > [2] [293419] start updating data
> >> > [sleep] [228312]
> >> > [error] [228312] java.io.IOException: [error] [228312] reaches the
> >> maximum
> >> > number of attempts whiling updating
> >> > [3] [228312] start collecting output228312
> >> > [3.1 done with null] [228312] done228312
> >> > [fail] [228312] java.io.IOException: 3.1 throw null228312
> >> > [done] [228312] done228312
> >> > [sleep] [293419]
> >> > [error] [293419] java.io.IOException: [error] [293419] reaches the
> >> maximum
> >> > number of attempts whiling updating
> >> > [3] [293419] start collecting output293419
> >> > [3.1 done with null] [293419] done293419
> >> > [fail] [293419] java.io.IOException: 3.1 throw null293419
> >> > [done] [293419] done293419
> >> >
> >> > Here is the log from tasktracker:
> >> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: æ¢Ÿ artist: Plastic
> Tree
> >> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque
> du
> >> > Soleil
> >> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> >> > www.China.ie
> >> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> >> > www.China.ie
> >> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> Simian
> >> > Mobile Disco
> >> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> Simian
> >> > Mobile Disco
> >> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> Simian
> >> > Mobile Disco
> >> >
> >> > From these logs, we can see that the last read in entry is "i bealive
> >> > artist: Simian Mobile Disco" the last process entry in the mapper is
> the
> >> > same as this entry and from the stdout log, we can see the map
> function
> >> has
> >> > finished....
> >> >
> >>
> >> Put some stdout or logging code towards the end of the mapper and also
> >> check
> >> if all threads are coming back. Do you think it could be some issue with
> >> the
> >> threads?
> >>
> >>
> >> > Vasekar: The HDFS is healthy. We didn't store too many small files in
> it
> >> > yet. The return of command "hadoop fsck /" is like follows:
> >> > Total size:    89114318394 B (Total open files size: 19845943808 B)
> >> >  Total dirs:    430
> >> >  Total files:   1761 (Files currently being written: 137)
> >> >  Total blocks (validated):      2691 (avg. block size 33115688 B)
> (Total
> >> > open file blocks (not validated): 309)
> >> >  Minimally replicated blocks:   2691 (100.0 %)
> >> >  Over-replicated blocks:        0 (0.0 %)
> >> >  Under-replicated blocks:       0 (0.0 %)
> >> >  Mis-replicated blocks:         0 (0.0 %)
> >> >  Default replication factor:    3
> >> >  Average block replication:     2.802304
> >> >  Corrupt blocks:                0
> >> >  Missing replicas:              0 (0.0 %)
> >> >  Number of data-nodes:          76
> >> >  Number of racks:               1
> >> >
> >> > Is this problem possibly due to the stuck communication between the
> >> actual
> >> > task (the mapper) and the tasktracker? From the logs, we cannot see
> >> > anything
> >> > after the stuck.
> >> >
> >>
> >> The TT and JT logs would show if there is a lost communication. Enable
> >> DEBUG
> >> logging for the processes and keep a tab.
> >>
> >>
> >> >
> >> >
> >> > fromAmandeep Khurana <[email protected]>
> >> > [email protected]
> >> > [email protected]
> >> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does
> not
> >> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
> >> > Filter
> >> > messages from this mailing
> >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> >> > from this mailing-list
> >> > hide details 4:36 PM (1 hour ago)
> >> > Did you try to add any logging and see what keys are they getting
> stuck
> >> on
> >> > or whats the last keys it processed? Do the same number of mappers get
> >> > stuck
> >> > every time?
> >> >
> >> > Not having reducers is not a problem. Its pretty normal to do that.
> >> >
> >> > fromAmogh Vasekar <[email protected]>
> >> > [email protected]
> >> > to"[email protected]" <[email protected]>
> >> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does
> not
> >> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
> >> > Filter
> >> > messages from this mailing
> >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> >> > from this mailing-list
> >> > hide details 4:50 PM (1 hour ago)
> >> >
> >> > Hi,
> >> > Quick questions...
> >> > Are you creating too many small files?
> >> > Are there any task side files being created?
> >> > Is the heap for NN having enough space to list metadata? Any details
> on
> >> its
> >> > general health will probably be helpful to people on the list.
> >> >
> >> > Amogh
> >> > Best regards,
> >> > Zhang Bingjun (Eddy)
> >> >
> >> > E-mail: [email protected], [email protected],
> [email protected]
> >> > Tel No: +65-96188110 (M)
> >> >
> >> >
> >> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> >> > [email protected]> wrote:
> >> >
> >> > > Hi Eddy,
> >> > >
> >> > > I faced similar issue when I used pig script for fetching webpages
> for
> >> > > certain urls. I could see the map phase showing100% and it is still
> >> > > running. As I was logging the page that it is currently fetching, I
> >> > > could see the process hasn't yet finished. It might be the same
> issue.
> >> > > So, you can add logging to check whether it is actually stuck or the
> >> > > process is still going on.
> >> > >
> >> > > Thanks
> >> > > Pallavi
> >> > >
> >> > > ________________________________
> >> > >
> >> > > From: Zhang Bingjun (Eddy) [mailto:[email protected]]
> >> > > Sent: Monday, November 02, 2009 2:03 PM
> >> > > To: [email protected]; [email protected];
> >> > > [email protected]; [email protected]
> >> > > Subject: too many 100% mapper does not complete / finish / commit
> >> > >
> >> > >
> >> > > Dear hadoop fellows,
> >> > >
> >> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data.
> In
> >> > > this case, we only have mappers to crawl data and save data into
> HDFS
> >> in
> >> > > a distributed way. No reducers is specified in the job conf.
> >> > >
> >> > > The problem is that for every job we have about one third mappers
> >> stuck
> >> > > with 100% progress but never complete. If we look at the the
> >> tasktracker
> >> > > log of those mappers, the last log was the key input INFO log line
> and
> >> > > no others logs were output after that.
> >> > >
> >> > > From the stdout log of a specific attempt of one of those mappers,
> we
> >> > > can see that the map function of the mapper has been finished
> >> completely
> >> > > and the control of the execution should be somewhere in the
> MapReduce
> >> > > framework part.
> >> > >
> >> > > Does anyone have any clue about this problem? Is it because we
> didn't
> >> > > use any reducers? Since two thirds of the mappers could complete
> >> > > successfully and commit their output data into HDFS, I suspect the
> >> stuck
> >> > > mappers has something to do with the MapReduce framework code?
> >> > >
> >> > > Any input will be appreciated. Thanks a lot!
> >> > >
> >> > > Best regards,
> >> > > Zhang Bingjun (Eddy)
> >> > >
> >> > > E-mail: [email protected], [email protected],
> >> [email protected]
> >> > > Tel No: +65-96188110 (M)
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: too many 100% mapper does not complete / finish / commit

Reply via email to