Hi Bobby,
I just want to ask you if there is away of using a reducer or something like
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?
FYI: I have used a reducer NONE before:
HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose
and a sample of my output using the mapper of two different slave nodes looks
like this :
AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAACCCCAAAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[[[[[..................((((.(((((((...............))))))).))))............{{{{....]]]]].....}}}}....
(-13.46)
GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUUUUUCU
((((.(((((....((.((((((.......))))))))...))))).)))). (-11.00)
and I want to concatenate and output them as a single predicated RNA sequence
structure:
AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAACCCCAAAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUUUUUCU
[[[[[..................((((.(((((((...............))))))).))))............{{{{....]]]]].....}}}}....((((.(((((....((.((((((.......))))))))...))))).)))).
Regards,
Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
[email protected]
> From: [email protected]
> To: [email protected]
> Subject: RE: Hadoop-streaming using binary executable c program
> Date: Tue, 26 Jul 2011 16:23:10 +0000
>
>
> Good afternoon Bobby,
>
> Thanks so much, now its working excellent. And the speed is also reasonable.
> Once again thanks u.
>
> Regards,
>
> Daniel T. Yehdego
> Computational Science Program
> University of Texas at El Paso, UTEP
> [email protected]
>
> > From: [email protected]
> > To: [email protected]
> > Date: Mon, 25 Jul 2011 14:47:34 -0700
> > Subject: Re: Hadoop-streaming using binary executable c program
> >
> > This is likely to be slow and it is not ideal. The ideal would be to
> > modify pknotsRG to be able to read from stdin, but that may not be possible.
> >
> > The shell script would probably look something like the following
> >
> > #!/bin/sh
> > rm -f temp.txt;
> > while read line
> > do
> > echo $line >> temp.txt;
> > done
> > exec pknotsRG temp.txt;
> >
> > Place it in a file say hadoopPknotsRG Then you probably want to run
> >
> > chmod +x hadoopPknotsRG
> >
> > After that you want to test it with
> >
> > hadoop fs -cat
> > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2
> > | ./hadoopPknotsRG
> >
> > If that works then you can try it with Hadoop streaming
> >
> > HADOOP_HOME$ bin/hadoop jar
> > /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
> > ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
> > /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
> > /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
> > /user/yehdego/RF-out -reducer NONE -verbose
> >
> > --Bobby
> >
> > On 7/25/11 3:37 PM, "Daniel Yehdego" <[email protected]> wrote:
> >
> >
> >
> > Good afternoon Bobby,
> >
> > Thanks, you gave me a great help in finding out what the problem was. After
> > I put the command line you suggested me, I found out that there was a
> > segmentation error.
> > The binary executable program pknotsRG only reads a file with a sequence in
> > it. This means, there should be a shell script, as you have said, that will
> > take the data coming
> > from stdin and write it to a temporary file. Any idea on how to do this job
> > in shell script. The thing is I am from a biology background and don't have
> > much experience in CS.
> > looking forward to hear from you. Thanks so much.
> >
> > Regards,
> >
> > Daniel T. Yehdego
> > Computational Science Program
> > University of Texas at El Paso, UTEP
> > [email protected]
> >
> > > From: [email protected]
> > > To: [email protected]
> > > Date: Fri, 22 Jul 2011 12:39:08 -0700
> > > Subject: Re: Hadoop-streaming using binary executable c program
> > >
> > > I would suggest that you do the following to help you debug.
> > >
> > > hadoop fs -cat
> > > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head
> > > -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
> > >
> > > This is simulating what hadoop streaming is doing. Here we are taking
> > > the first 2 lines out of the input file and feeding them to the stdin of
> > > pknotsRG. The first step is to make sure that you can get your program
> > > to run correctly with something like this. You may need to change the
> > > command line to pknotsRG to get it to read the data it is processing from
> > > stdin, instead of from a file. Alternatively you may need to write a
> > > shell script that will take the data coming from stdin. Write it to a
> > > file and then call pknotsRG on that temporary file. Once you have this
> > > working then you should try it again with streaming.
> > >
> > > --Bobby Evans
> > >
> > > On 7/22/11 12:31 PM, "Daniel Yehdego" <[email protected]> wrote:
> > >
> > >
> > >
> > > Hi Bobby, Thanks for the response.
> > >
> > > After I tried the following comannd:
> > >
> > > bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper
> > > /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - -file
> > > /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -reducer NONE
> > > -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > > -output /user/yehdego/RF-out - verbose
> > >
> > > I got a stderr logs :
> > >
> > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > > failed with code 139
> > > at
> > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > at
> > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > at
> > > org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > at
> > > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >
> > >
> > >
> > > syslog logs
> > >
> > > 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > > Initializing JVM Metrics with processName=MAP, sessionId=
> > > 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask:
> > > numReduceTasks: 0
> > > 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed:
> > > PipeMapRed exec
> > > [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_000000_0/work/./pknotsRG]
> > > 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed:
> > > R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed:
> > > MROutputThread done
> > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed:
> > > MRErrorThread done
> > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed:
> > > PipeMapRed failed!
> > > 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error
> > > running child
> > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > > failed with code 139
> > > at
> > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > at
> > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > at
> > > org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > at
> > > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner:
> > > Runnning cleanup for the task
> > >
> > >
> > >
> > > Regards,
> > >
> > > Daniel T. Yehdego
> > > Computational Science Program
> > > University of Texas at El Paso, UTEP
> > > [email protected]
> > >
> > > > From: [email protected]
> > > > To: [email protected]; [email protected]
> > > > Date: Fri, 22 Jul 2011 09:12:18 -0700
> > > > Subject: Re: Hadoop-streaming using binary executable c program
> > > >
> > > > It looks like it tried to run your program and the program exited with
> > > > a 1 not a 0. What are the stderr logs like for the mappers that were
> > > > launched, you should be able to access them through the Web GUI? You
> > > > might want to add in some stderr log messages to you c program too. To
> > > > be able to debug how far along it is going before exiting.
> > > >
> > > > --Bobby Evans
> > > >
> > > > On 7/22/11 9:19 AM, "Daniel Yehdego" <[email protected]> wrote:
> > > >
> > > > I am trying to parallelize some very long RNA sequence for the sake of
> > > > predicting their RNA 2D structures. I am using a binary executable c
> > > > program called pknotsRG as my mapper. I tried the following bin/hadoop
> > > > command:
> > > >
> > > > HADOOP_HOME$ bin/hadoop
> > > > jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> > > > -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> > > > -file /data/yehdego/hadoop-0.20.2/pknotsRG
> > > > -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > > > -output /user/yehdego/RF-out -reducer NONE -verbose
> > > >
> > > > but i keep getting the following error message:
> > > >
> > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > > > failed with code 1
> > > > at
> > > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > > at
> > > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > > at
> > > > org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > > at
> > > > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > > at
> > > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > > FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
> > > > is a chunk of RNA sequences and the mapper is expected to get the input
> > > > and execute the input file line by line and out put the predicted
> > > > structure for each line of sequence for a specified number of maps. Any
> > > > help on this problem is really appreciated. Thanks.
> > > >
> > > >
> > >
> > >
> >
> >
>