Thank you very much for your answers. Yesterday I was so exhausted that my brain didn't work and I didn't know how to explain what I was looking for. Actually I was trying to do a perl script to compare two strings according to the identity Blast program index (bioinformatics). As input file I have a fasta file (http://en.wikipedia.org/wiki/FASTA_format) with hundred of sequences. I compare each of them (blast program do this) to each order in order to get those that are < 95% homologous. Finally I got a two columns file where there's the number of sequences with < 95% homology. Logically, if sequence 1 has a < 95% homology with sequence 7, sequence 7 has < 95 % homology with sequence 1, so I got a redundancy. I'm looking for a perl script to remove that. I supose that I have to do this with hashes, but I don't know enough about them. Thank you very much for your help. I attache a input file and a blast1.pl. Thanks a lot.
PD Sorry!! I was trying this with Bioperl! 2010/9/16 Jordi Durban <jordi.dur...@gmail.com> > Hi all! > I would like to ask you something. I have a file as follows: > File A File B > uid = 1 uid = 4 > uid = 2 uid = 3 > uid = 3 uid = 2 > uid = 4 uid = 1 > > I'm trying to make a perl script to get me only those lines whose numbers > are the same regardless in the column they are. That is, in the example, > the second and third I want to appear only once. > thank you > > -- > Jordi > > -- Jordi
>1 uaccno=FGX3UK402FFRW9 CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS >2 uaccno=FGX3UK402ICJCT CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS >3 uaccno=FGX3UK402IC6J3 CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS >4 uaccno=FGX3UK402JZ9PW CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS >5 uaccno=FGX3UK402FMWRM CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS >6 uaccno=FGSMDPN09FKBBG CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS >7 uaccno=FGX3UK402ILAN8 ADAERFCSEQAKGGHLVSIERFGREDEFVSNLITKNIQRGVSYVWIGMRIQNKEKQCSS >8 uaccno=FGX3UK402ILAN8 WSSYEGHCYRFFKEIKKLGRMQR >9 uaccno=FGX3UK402GIKOW DCPSGWSSYEGHCYRFFKESKNWADAERFCSEQAKGGHLVSIERFGREDEFV >10 uaccno=FGX3UK402FLOOP DCLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFV >11 uaccno=FGX3UK402JFTTO NWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLRAEGKGQQCSSKWSDGSCVCYE >12 uaccno=FGX3UK402IQ4I1 CYKFFQQKKSWDGVNWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLRAEGKGQQCSS >13 uaccno=FGX3UK402JDACN VNWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLRAEGKGQQCSSKWSDG >14 uaccno=FHA3VDO14IHU72 DCLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHL >15 uaccno=FGX3UK402GO9M4 PRGWSSYEGHCYRFFKESKNWADAERFCSEQAKGGHLL >16 uaccno=FGX3UK402HAG8Y DCLSGWSSYEGSCYKVFKERMNWEDAEEFCTQQQTGGHLVSFQSKEEADF >17 uaccno=FGX3UK402JGQW2 ERHCYRVFQRKMNWANAERWCAQQYKESHLVSFHSSEEVDFVVSLTYPILKATLVWTGLSNIWKECRLEWSDGTKVN >18 uaccno=FGX3UK402I4A8M VTWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLE >19 uaccno=FGX3UK402I4A8M RAEGKGQQCSSKWSDG >20 uaccno=FGX3UK402IXGJF DCPSGWSPYEGSCYKLFKKEMNWADAESLCALQRKECHLVSFHSSEEVDF >21 uaccno=FG5JTBU01DU4UW YEGSCYKFFQQRMNWADAERFCSEQAKGGH >22 uaccno=FG5JTBU01DCUCP CYKLFQQKMNWADAERFCSEQAKGGHLVFIENSGEGEFV >23 uaccno=FGX3UK402ILOR5 DCPPDWSSYERHCYRVFQRKMNWANAERWCAQQYKESHLVSFH >24 uaccno=FG5JTBU01D4TQK DCPPDWSSYERHCYRVFQRKMNWANAERWCAQQYKESHLVSF >25 uaccno=FGX3UK402I4XPF DCLSGWSSYEGSCYKFFQQRMNWADAERF >26 uaccno=FGX3UK402INU5X DCLSGWSSYEGSCYKFFQQRMNWADAERF >27 uaccno=FG5JTBU01BJT9A YKFFQQKKSWDGVNWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNI >28 uaccno=FGX3UK402IMBLJ DCPSGWSPYEGSCYKLFKKEMNWADAESLCALRA >29 uaccno=FGX3UK402FRG38 DCPSGWSSYEGHCYRFFKESKNWADAERF >30 uaccno=FG5JTBU01AIKXR DCPPDWSSYERHCYRVFQRKMNWANAERWCAQQYK >31 uaccno=FGX3UK402G2AKE DCPSGWSSYEGHCYRFFKESKNWADAE >32 uaccno=FGX3UK402I6MAC DCPSGWSSYEGHCYRFFKESKNWADAE >33 uaccno=FGX3UK402I8DQO DCPPDWSSYERHCYRVFQRKMNWANAERWC >34 uaccno=FGX3UK402JQL4P ECPSDWSTHGQYCYKFFQQKKSWDGVNWDDAERFCSEQVEG >35 uaccno=FGX3UK402G4RFX DCLSGWSSYEGSCYKVFKERMNWEDAE >36 uaccno=FGX3UK402GMWHD DCLSGWSSYEGSCYKVFKERMNWEDAE >37 uaccno=FGX3UK402F8M9D DCLSGWSSYEGSCYKVFKERMNWEDAE >38 uaccno=FGX3UK402JJXKY DCPPDWSSYERHCYRVFQRKMNWANAERW >39 uaccno=FGX3UK402GE4RM DCPPDWSSYERHCYRVFQRKMNWANAERW >40 uaccno=FG5JTBU01DPK38 KFFQQKKSWDGVNWDDAERFCSEQAKGGHLGLYRKRR >41 uaccno=FGX3UK402FRQ85 EGSCYKFFQQRMNWADAER >42 uaccno=FGX3UK402FRQ85 DCLSGWSSY >43 uaccno=FGX3UK402GLINR ECPSDWSTHRQYCYKFFQQKRSW >44 uaccno=FGX3UK402I4GP7 DCPPDWSSYERHCYRVFQRK >45 uaccno=FGX3UK402J5438 DCPPDWSSYERHCYRVFQRK >46 uaccno=FG5JTBU01DUHAR SSYEGSCYKVFKERMNWG >47 uaccno=FGX3UK402GL55O ECPSDWSTHGQYCYKFFQQK >48 uaccno=FGX3UK402GSK7H DCPPDWSSYERHCYRVFQT >49 uaccno=FG5JTBU01COJIU ECPSDWSTHGQYCYKFFQQR
blast1.pl
Description: Perl program
-- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/