inline On Mon, Nov 2, 2009 at 3:15 AM, Zhang Bingjun (Eddy) <[email protected]>wrote:
> Dear Khurana, > > We didn't use MapRunnable. In stead, we used directly the package > org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our > normal Mapper Class to it using its getMapperClass() interface. We set the > number of threads using its setNumberOfThreads(). Is this one correct way > of > doing multithreaded mapper? > I was just curious on how you did it. This is the right way afaik > > We noticed in hadoop-0.20.1 there is another > MultithreadedMapper, org.apache.hadoop.mapred.lib.map.MultithreadedMapper, > but we didn't touch it. > Thats the deprecated package. You used the correct one. > > It might be the reason that some thread didn't return. We need to do some > work to confirm that. We will also try to enable DEBUG mode of hadoop. > Could > you share some info on starting an hadoop deamon or the whole hadoop > cluster > in debug mode? > You'll have to edit the log4jproperties file in $HADOOP_HOME/conf/ After editing, you'll have to restart the daemons (or the entire cluster). The DEBUG logs might give some more info of whats happening. > > Thanks a lot! > > Best regards, > Zhang Bingjun (Eddy) > > E-mail: [email protected], [email protected], [email protected] > Tel No: +65-96188110 (M) > > > On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <[email protected] > >wrote: > > > Hi all, > > > > An important observation. The 100% mapper without completion all have > > temporary files of 64MB exactly, which means the output of the mapper is > cut > > off at the block boundary. However, we do have some successfully > completed > > mappers having output files larger than 64MB and we also have less than > 100% > > mappers have temporary files larger than 64MB. > > > > Here is the info returned by "hadoop fs -ls > > > /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0 > > -rw-r--r-- 3 hadoop supergroup 67108864 2009-11-02 14:29 > > > /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091 > > > > This is the temporary file of a 100% mapper without completion. > > > > Any clues on this? > > > > Best regards, > > Zhang Bingjun (Eddy) > > > > E-mail: [email protected], [email protected], [email protected] > > Tel No: +65-96188110 (M) > > > > > > On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <[email protected]> > wrote: > > > >> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) < > [email protected] > >> >wrote: > >> > >> > Hi Pallavi, Khurana, and Vasekar, > >> > > >> > Thanks a lot for your reply. To make up, the mapper we are using is > the > >> > multithreaded mapper. > >> > > >> > >> How are you doing this? Did you your own MapRunnable? > >> > >> > > > >> > > >> > To answer your questions: > >> > > >> > Pallavi, Khurana: I have checked the logs. The key it got stuck on is > >> the > >> > last key it reads in. Since the progress is 100% I suppose the key is > >> the > >> > last key? From the stdout log of our mapper, we are confirmed that the > >> map > >> > function of the mapper has completed. After that, no more key was read > >> in > >> > and no other progress is made by the mapper, which means it didn't > >> complete > >> > / commit being 100%. For each job, we have different number of mapper > >> got > >> > stuck. But it is roughly about one third to half mappers. From the > >> stdout > >> > logs of our mapper, we are also confirmed that the map function of the > >> > mapper has finished. That's why we started to suspect the MapReduce > >> > framework has something to do with the stuck problem. > >> > > >> > Here is log from the stdout: > >> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile > >> > Disco</artist></track> > >> > [0] [293419] start creating objects > >> > [1] [293419] start parsing xml > >> > [2] [293419] start updating data > >> > [sleep] [228312] > >> > [error] [228312] java.io.IOException: [error] [228312] reaches the > >> maximum > >> > number of attempts whiling updating > >> > [3] [228312] start collecting output228312 > >> > [3.1 done with null] [228312] done228312 > >> > [fail] [228312] java.io.IOException: 3.1 throw null228312 > >> > [done] [228312] done228312 > >> > [sleep] [293419] > >> > [error] [293419] java.io.IOException: [error] [293419] reaches the > >> maximum > >> > number of attempts whiling updating > >> > [3] [293419] start collecting output293419 > >> > [3.1 done with null] [293419] done293419 > >> > [fail] [293419] java.io.IOException: 3.1 throw null293419 > >> > [done] [293419] done293419 > >> > > >> > Here is the log from tasktracker: > >> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic > Tree > >> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque > du > >> > Soleil > >> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist: > >> > www.China.ie > >> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist: > >> > www.China.ie > >> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: > Simian > >> > Mobile Disco > >> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: > Simian > >> > Mobile Disco > >> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker: > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: > Simian > >> > Mobile Disco > >> > > >> > From these logs, we can see that the last read in entry is "i bealive > >> > artist: Simian Mobile Disco" the last process entry in the mapper is > the > >> > same as this entry and from the stdout log, we can see the map > function > >> has > >> > finished.... > >> > > >> > >> Put some stdout or logging code towards the end of the mapper and also > >> check > >> if all threads are coming back. Do you think it could be some issue with > >> the > >> threads? > >> > >> > >> > Vasekar: The HDFS is healthy. We didn't store too many small files in > it > >> > yet. The return of command "hadoop fsck /" is like follows: > >> > Total size: 89114318394 B (Total open files size: 19845943808 B) > >> > Total dirs: 430 > >> > Total files: 1761 (Files currently being written: 137) > >> > Total blocks (validated): 2691 (avg. block size 33115688 B) > (Total > >> > open file blocks (not validated): 309) > >> > Minimally replicated blocks: 2691 (100.0 %) > >> > Over-replicated blocks: 0 (0.0 %) > >> > Under-replicated blocks: 0 (0.0 %) > >> > Mis-replicated blocks: 0 (0.0 %) > >> > Default replication factor: 3 > >> > Average block replication: 2.802304 > >> > Corrupt blocks: 0 > >> > Missing replicas: 0 (0.0 %) > >> > Number of data-nodes: 76 > >> > Number of racks: 1 > >> > > >> > Is this problem possibly due to the stuck communication between the > >> actual > >> > task (the mapper) and the tasktracker? From the logs, we cannot see > >> > anything > >> > after the stuck. > >> > > >> > >> The TT and JT logs would show if there is a lost communication. Enable > >> DEBUG > >> logging for the processes and keep a tab. > >> > >> > >> > > >> > > >> > fromAmandeep Khurana <[email protected]> > >> > [email protected] > >> > [email protected] > >> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does > not > >> > complete / finish / commitmailing list<common-user.hadoop.apache.org> > >> > Filter > >> > messages from this mailing > >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe > >> > from this mailing-list > >> > hide details 4:36 PM (1 hour ago) > >> > Did you try to add any logging and see what keys are they getting > stuck > >> on > >> > or whats the last keys it processed? Do the same number of mappers get > >> > stuck > >> > every time? > >> > > >> > Not having reducers is not a problem. Its pretty normal to do that. > >> > > >> > fromAmogh Vasekar <[email protected]> > >> > [email protected] > >> > to"[email protected]" <[email protected]> > >> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does > not > >> > complete / finish / commitmailing list<common-user.hadoop.apache.org> > >> > Filter > >> > messages from this mailing > >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe > >> > from this mailing-list > >> > hide details 4:50 PM (1 hour ago) > >> > > >> > Hi, > >> > Quick questions... > >> > Are you creating too many small files? > >> > Are there any task side files being created? > >> > Is the heap for NN having enough space to list metadata? Any details > on > >> its > >> > general health will probably be helpful to people on the list. > >> > > >> > Amogh > >> > Best regards, > >> > Zhang Bingjun (Eddy) > >> > > >> > E-mail: [email protected], [email protected], > [email protected] > >> > Tel No: +65-96188110 (M) > >> > > >> > > >> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi < > >> > [email protected]> wrote: > >> > > >> > > Hi Eddy, > >> > > > >> > > I faced similar issue when I used pig script for fetching webpages > for > >> > > certain urls. I could see the map phase showing100% and it is still > >> > > running. As I was logging the page that it is currently fetching, I > >> > > could see the process hasn't yet finished. It might be the same > issue. > >> > > So, you can add logging to check whether it is actually stuck or the > >> > > process is still going on. > >> > > > >> > > Thanks > >> > > Pallavi > >> > > > >> > > ________________________________ > >> > > > >> > > From: Zhang Bingjun (Eddy) [mailto:[email protected]] > >> > > Sent: Monday, November 02, 2009 2:03 PM > >> > > To: [email protected]; [email protected]; > >> > > [email protected]; [email protected] > >> > > Subject: too many 100% mapper does not complete / finish / commit > >> > > > >> > > > >> > > Dear hadoop fellows, > >> > > > >> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data. > In > >> > > this case, we only have mappers to crawl data and save data into > HDFS > >> in > >> > > a distributed way. No reducers is specified in the job conf. > >> > > > >> > > The problem is that for every job we have about one third mappers > >> stuck > >> > > with 100% progress but never complete. If we look at the the > >> tasktracker > >> > > log of those mappers, the last log was the key input INFO log line > and > >> > > no others logs were output after that. > >> > > > >> > > From the stdout log of a specific attempt of one of those mappers, > we > >> > > can see that the map function of the mapper has been finished > >> completely > >> > > and the control of the execution should be somewhere in the > MapReduce > >> > > framework part. > >> > > > >> > > Does anyone have any clue about this problem? Is it because we > didn't > >> > > use any reducers? Since two thirds of the mappers could complete > >> > > successfully and commit their output data into HDFS, I suspect the > >> stuck > >> > > mappers has something to do with the MapReduce framework code? > >> > > > >> > > Any input will be appreciated. Thanks a lot! > >> > > > >> > > Best regards, > >> > > Zhang Bingjun (Eddy) > >> > > > >> > > E-mail: [email protected], [email protected], > >> [email protected] > >> > > Tel No: +65-96188110 (M) > >> > > > >> > > > >> > > >> > > > > >
