RE: Hadoop-streaming using binary executable c program

Daniel Yehdego Fri, 02 Dec 2011 19:32:59 -0800



Hi.........

I was trying to run hadoop streaming and before that I check with the following 
:
bin/hadoop fs -cat 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | 
head -2 | ./HADOOP 
Were HADOOP is a shell script:
#!/bin/shrm -f temp.txt;while read line doecho $line >> temp.txt;doneexec 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -k o -F temp.txt;
and its working, but when i try running on streaming using the following:
 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
./HADOOP  -file /data/yehdego/hadoop-0.20.2/HADOOP -file 
/data/yehdego/hadoop-0.20.2/PKNOTSRG/src/bin/pknotsRG -reducer 
./ReduceLatest.py -file /data/yehdego/hadoop-0.20.2/ReduceLatest.py -input 
/user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt  
-output /user/yehdego/RF171_NEW/RF00171_A.bpseqL3G1_Optimized_Method40.txt 
-verbose 
it failed with the following error:
PipeMapRed\.waitOutputThreads(): subprocess failed with code 126        at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
  at 
org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545)
     at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) 
     at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)      at 
org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)   at 
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358)        at 
org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at 
org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
Any idea on this problem ?
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
[email protected]

> From: [email protected]
> To: [email protected]
> Date: Mon, 25 Jul 2011 14:47:34 -0700
> Subject: Re: Hadoop-streaming using binary executable c program
> 
> This is likely to be slow and it is not ideal.  The ideal would be to modify 
> pknotsRG to be able to read from stdin, but that may not be possible.
> 
> The shell script would probably look something like the following
> 
> #!/bin/sh
> rm -f temp.txt;
> while read line
> do
>   echo $line >> temp.txt;
> done
> exec pknotsRG temp.txt;
> 
> Place it in a file say hadoopPknotsRG  Then you probably want to run
> 
> chmod +x hadoopPknotsRG
> 
> After that you want to test it with
> 
> hadoop fs -cat 
> /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | 
> ./hadoopPknotsRG
> 
> If that works then you can try it with Hadoop streaming
> 
> HADOOP_HOME$ bin/hadoop jar 
> /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
> ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
> /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
> /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
> /user/yehdego/RF-out -reducer NONE -verbose
> 
> --Bobby
> 
> On 7/25/11 3:37 PM, "Daniel Yehdego" <[email protected]> wrote:
> 
> 
> 
> Good afternoon Bobby,
> 
> Thanks, you gave me a great help in finding out what the problem was. After I 
> put the command line you suggested me, I found out that there was a 
> segmentation error.
> The binary executable program pknotsRG only reads a file with a sequence in 
> it. This means, there should be a shell script, as you have said, that will 
> take the data coming
> from stdin and write it to a temporary file. Any idea on how to do this job 
> in shell script. The thing is I am from a biology background and don't have 
> much experience in CS.
> looking forward to hear from you. Thanks so much.
> 
> Regards,
> 
> Daniel T. Yehdego
> Computational Science Program
> University of Texas at El Paso, UTEP
> [email protected]
> 
> > From: [email protected]
> > To: [email protected]
> > Date: Fri, 22 Jul 2011 12:39:08 -0700
> > Subject: Re: Hadoop-streaming using binary executable c program
> >
> > I would suggest that you do the following to help you debug.
> >
> > hadoop fs -cat 
> > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
> > | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
> >
> > This is simulating what hadoop streaming is doing.  Here we are taking the 
> > first 2 lines out of the input file and feeding them to the stdin of 
> > pknotsRG.  The first step is to make sure that you can get your program to 
> > run correctly with something like this.  You may need to change the command 
> > line to pknotsRG to get it to read the data it is processing from stdin, 
> > instead of from a file.  Alternatively you may need to write a shell script 
> > that will take the data coming from stdin.  Write it to a file and then 
> > call pknotsRG on that temporary file.  Once you have this working then you 
> > should try it again with streaming.
> >
> > --Bobby Evans
> >
> > On 7/22/11 12:31 PM, "Daniel Yehdego" <[email protected]> wrote:
> >
> >
> >
> > Hi Bobby, Thanks for the response.
> >
> > After I tried the following comannd:
> >
> > bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper 
> > /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file 
> > /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input 
> > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
> > /user/yehdego/RF-out - verbose
> >
> > I got a stderr logs :
> >
> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess 
> > failed with code 139
> >         at 
> > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> >         at 
> > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> >         at 
> > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> >
> >
> > syslog logs
> >
> > 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> > Initializing JVM Metrics with processName=MAP, sessionId=
> > 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: 
> > numReduceTasks: 0
> > 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: 
> > PipeMapRed exec 
> > [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_000000_0/work/./pknotsRG]
> > 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: 
> > R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
> > MROutputThread done
> > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
> > MRErrorThread done
> > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: 
> > PipeMapRed failed!
> > 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> > running child
> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess 
> > failed with code 139
> >         at 
> > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> >         at 
> > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> >         at 
> > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
> > cleanup for the task
> >
> >
> >
> > Regards,
> >
> > Daniel T. Yehdego
> > Computational Science Program
> > University of Texas at El Paso, UTEP
> > [email protected]
> >
> > > From: [email protected]
> > > To: [email protected]; [email protected]
> > > Date: Fri, 22 Jul 2011 09:12:18 -0700
> > > Subject: Re: Hadoop-streaming using binary executable c program
> > >
> > > It looks like it tried to run your program and the program exited with a 
> > > 1 not a 0.  What are the stderr logs like for the mappers that were 
> > > launched, you should be able to access them through the Web GUI?  You 
> > > might want to add in some stderr log messages to you c program too. To be 
> > > able to debug how far along it is going before exiting.
> > >
> > > --Bobby Evans
> > >
> > > On 7/22/11 9:19 AM, "Daniel Yehdego" <[email protected]> wrote:
> > >
> > > I am trying to parallelize some very long RNA sequence for the sake of
> > > predicting their RNA 2D structures. I am using a binary executable c
> > > program called pknotsRG as my mapper. I tried the following bin/hadoop
> > > command:
> > >
> > > HADOOP_HOME$ bin/hadoop
> > > jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> > > -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> > > -file /data/yehdego/hadoop-0.20.2/pknotsRG
> > > -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > > -output /user/yehdego/RF-out -reducer NONE -verbose
> > >
> > > but i keep getting the following error message:
> > >
> > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > > failed with code 1
> > >         at
> > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > >         at
> > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > >         at 
> > > org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > >         at 
> > > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >
> > > FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
> > > is a chunk of RNA sequences and the mapper is expected to get the input
> > > and execute the input file line by line and out put the predicted
> > > structure for each line of sequence for a specified number of maps. Any
> > > help on this problem is really appreciated. Thanks.
> > >
> > >
> >
> >
> 
>
RE: Hadoop-streaming using binary executable c program

Reply via email to