Thank you very much for your answers. Yesterday I was so exhausted that my
brain didn't work and I didn't know how to explain what I was looking for.
Actually I was trying to do a perl script to compare two strings according
to the identity Blast program index (bioinformatics). As input file I have a
fasta file (http://en.wikipedia.org/wiki/FASTA_format) with hundred of
sequences. I compare each of them (blast program do this) to each order in
order to get those that are < 95% homologous.
Finally I got a two columns file where there's  the number of sequences with
< 95% homology. Logically, if sequence 1 has a < 95%  homology with sequence
7, sequence 7 has < 95 % homology with sequence 1, so I got a redundancy.
I'm looking for a perl script to remove that.
I supose that I have to do this with hashes, but I don't know enough about
them.
Thank you very much for your help.
I attache a input file and a blast1.pl.
Thanks a lot.

PD Sorry!! I was trying this with Bioperl!

2010/9/16 Jordi Durban <jordi.dur...@gmail.com>

> Hi all!
> I would like to ask you something. I have a file as follows:
> File A           File B
> uid = 1         uid = 4
> uid = 2         uid = 3
> uid = 3         uid = 2
> uid = 4         uid = 1
>
> I'm trying to make a perl script to get me only those lines whose numbers
> are the same regardless in the column they are. That is, in the example,
> the second and third I want to appear only once.
> thank you
>
> --
> Jordi
>
>


-- 
Jordi
>1 uaccno=FGX3UK402FFRW9
CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS
>2 uaccno=FGX3UK402ICJCT
CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS
>3 uaccno=FGX3UK402IC6J3
CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS
>4 uaccno=FGX3UK402JZ9PW
CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS
>5 uaccno=FGX3UK402FMWRM
CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS
>6 uaccno=FGSMDPN09FKBBG
CLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFVVKLVTKNIQSRDLYAWIGLRVQNKEKQCS
>7 uaccno=FGX3UK402ILAN8
ADAERFCSEQAKGGHLVSIERFGREDEFVSNLITKNIQRGVSYVWIGMRIQNKEKQCSS
>8 uaccno=FGX3UK402ILAN8
WSSYEGHCYRFFKEIKKLGRMQR
>9 uaccno=FGX3UK402GIKOW
DCPSGWSSYEGHCYRFFKESKNWADAERFCSEQAKGGHLVSIERFGREDEFV
>10 uaccno=FGX3UK402FLOOP
DCLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHLVSFQSDGETDFV
>11 uaccno=FGX3UK402JFTTO
NWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLRAEGKGQQCSSKWSDGSCVCYE
>12 uaccno=FGX3UK402IQ4I1
CYKFFQQKKSWDGVNWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLRAEGKGQQCSS
>13 uaccno=FGX3UK402JDACN
VNWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLRAEGKGQQCSSKWSDG
>14 uaccno=FHA3VDO14IHU72
DCLSGWSSYEGSCYKFFQQRMNWADAERFCSEQAKGGHL
>15 uaccno=FGX3UK402GO9M4
PRGWSSYEGHCYRFFKESKNWADAERFCSEQAKGGHLL
>16 uaccno=FGX3UK402HAG8Y
DCLSGWSSYEGSCYKVFKERMNWEDAEEFCTQQQTGGHLVSFQSKEEADF
>17 uaccno=FGX3UK402JGQW2
ERHCYRVFQRKMNWANAERWCAQQYKESHLVSFHSSEEVDFVVSLTYPILKATLVWTGLSNIWKECRLEWSDGTKVN
>18 uaccno=FGX3UK402I4A8M
VTWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNIGEFTYYVWIGLE
>19 uaccno=FGX3UK402I4A8M
RAEGKGQQCSSKWSDG
>20 uaccno=FGX3UK402IXGJF
DCPSGWSPYEGSCYKLFKKEMNWADAESLCALQRKECHLVSFHSSEEVDF
>21 uaccno=FG5JTBU01DU4UW
YEGSCYKFFQQRMNWADAERFCSEQAKGGH
>22 uaccno=FG5JTBU01DCUCP
CYKLFQQKMNWADAERFCSEQAKGGHLVFIENSGEGEFV
>23 uaccno=FGX3UK402ILOR5
DCPPDWSSYERHCYRVFQRKMNWANAERWCAQQYKESHLVSFH
>24 uaccno=FG5JTBU01D4TQK
DCPPDWSSYERHCYRVFQRKMNWANAERWCAQQYKESHLVSF
>25 uaccno=FGX3UK402I4XPF
DCLSGWSSYEGSCYKFFQQRMNWADAERF
>26 uaccno=FGX3UK402INU5X
DCLSGWSSYEGSCYKFFQQRMNWADAERF
>27 uaccno=FG5JTBU01BJT9A
YKFFQQKKSWDGVNWDDAERFCSEQAKGGHLVSIESDEEASFVAQLVAPNI
>28 uaccno=FGX3UK402IMBLJ
DCPSGWSPYEGSCYKLFKKEMNWADAESLCALRA
>29 uaccno=FGX3UK402FRG38
DCPSGWSSYEGHCYRFFKESKNWADAERF
>30 uaccno=FG5JTBU01AIKXR
DCPPDWSSYERHCYRVFQRKMNWANAERWCAQQYK
>31 uaccno=FGX3UK402G2AKE
DCPSGWSSYEGHCYRFFKESKNWADAE
>32 uaccno=FGX3UK402I6MAC
DCPSGWSSYEGHCYRFFKESKNWADAE
>33 uaccno=FGX3UK402I8DQO
DCPPDWSSYERHCYRVFQRKMNWANAERWC
>34 uaccno=FGX3UK402JQL4P
ECPSDWSTHGQYCYKFFQQKKSWDGVNWDDAERFCSEQVEG
>35 uaccno=FGX3UK402G4RFX
DCLSGWSSYEGSCYKVFKERMNWEDAE
>36 uaccno=FGX3UK402GMWHD
DCLSGWSSYEGSCYKVFKERMNWEDAE
>37 uaccno=FGX3UK402F8M9D
DCLSGWSSYEGSCYKVFKERMNWEDAE
>38 uaccno=FGX3UK402JJXKY
DCPPDWSSYERHCYRVFQRKMNWANAERW
>39 uaccno=FGX3UK402GE4RM
DCPPDWSSYERHCYRVFQRKMNWANAERW
>40 uaccno=FG5JTBU01DPK38
KFFQQKKSWDGVNWDDAERFCSEQAKGGHLGLYRKRR
>41 uaccno=FGX3UK402FRQ85
EGSCYKFFQQRMNWADAER
>42 uaccno=FGX3UK402FRQ85
DCLSGWSSY
>43 uaccno=FGX3UK402GLINR
ECPSDWSTHRQYCYKFFQQKRSW
>44 uaccno=FGX3UK402I4GP7
DCPPDWSSYERHCYRVFQRK
>45 uaccno=FGX3UK402J5438
DCPPDWSSYERHCYRVFQRK
>46 uaccno=FG5JTBU01DUHAR
SSYEGSCYKVFKERMNWG
>47 uaccno=FGX3UK402GL55O
ECPSDWSTHGQYCYKFFQQK
>48 uaccno=FGX3UK402GSK7H
DCPPDWSSYERHCYRVFQT
>49 uaccno=FG5JTBU01COJIU
ECPSDWSTHGQYCYKFFQQR

Attachment: blast1.pl
Description: Perl program

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Reply via email to