Hi take a look at http://en.wikipedia.org/wiki/Levenshtein_distance
Regards, khalil On 19 Sep 2011, at 18:00, [email protected] wrote: > Send Biojava-l mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. Re: [Biojava-dev] A question about multiple alignment > (Andreas Prlic) > 2. UniprotParser (Saif Ur-Rehman) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 18 Sep 2011 16:50:27 -0700 > From: Andreas Prlic <[email protected]> > Subject: Re: [Biojava-l] [Biojava-dev] A question about multiple > alignment > To: Shahab Kamali <[email protected]> > Cc: [email protected] > Message-ID: > <calthepxebhovspzc3yvu1_+15ourceyezsyaux8qm1mnh-d...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Shahab, > > Sounds like you want to use an identity matrix for the alignment.. > > Andreas > > On Sat, Sep 17, 2011 at 3:28 PM, Shahab Kamali <[email protected]> > wrote: >> Thanks Andreas, >> I want two components that have different names to have 0 alignment score. >> My application is not about bio-compounds,so I can use anything else rather >> than ProteinSequence and AminoAcidCompound. I just need to align sequences >> of arbitrary alphabets. Could you suggest me a solution please? >> Thanks a lot, >> Shahab >> >> Quoting Andreas Prlic <[email protected]>: >> >>> Hi Shahab, >>> >>> did you take a look at the substitution matrix, if it is scoring your >>> sequences according to your expectation? Looks like in your >>> theoretical example the alignment of B and D is favorable, i.e. it has >>> a positive alignment score.. >>> >>> Andreas >>> >>> >>> On Fri, Sep 16, 2011 at 10:56 AM, Shahab Kamali <[email protected]> >>> wrote: >>>> >>>> Hi, >>>> I am using BioJava in a pattern mining project. I want to align a set of >>>> relatively short sequences. For example to align {"ABCE", "ABCE", "ADE", >>>> "ADE"). >>>> >>>> This is a part of my code: >>>> >>>> SubstitutionMatrix<AminoAcidCompound> matrix = new >>>> ? ? ? ? ? ? ? ? ? ?SimpleSubstitutionMatrix<AminoAcidCompound>(); >>>> GuideTree<ProteinSequence, AminoAcidCompound> gt = new >>>> GuideTree<ProteinSequence, >>>> AminoAcidCompound>(lst,Alignments.getAllPairsScorers(lst, >>>> ? ? ? ? ? ? ? ? ? Alignments.PairwiseSequenceScorerType.GLOBAL, ?new >>>> ? ? ? ? ? ? ? ? ? SimpleGapPenalty((short)0,(short)0), matrix)); >>>> ? ? ? ? ? ?Profile<ProteinSequence, AminoAcidCompound> profile = >>>> >>>> Alignments.getProgressiveAlignment(gt,Alignments.ProfileProfileAlignerType.GLOBAL, >>>> new SimpleGapPenalty((short)0,(short)0),matrix); >>>> >>>> The result of the above code is: >>>> ABCE >>>> ABCE >>>> AD-E >>>> AD-E >>>> >>>> But what I need is >>>> A-BCE >>>> A-BCE >>>> AD--E >>>> AD--E >>>> or >>>> ABC-E >>>> ABC-E >>>> A--DE >>>> A--DE >>>> >>>> Do you have any suggestion? >>>> Thanks, >>>> Shahab >>>> >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> [email protected] >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> >> >> >> >> >> > > > > ------------------------------ > > Message: 2 > Date: Mon, 19 Sep 2011 11:09:46 +0100 > From: Saif Ur-Rehman <[email protected]> > Subject: [Biojava-l] UniprotParser > To: [email protected] > Message-ID: > <CABpZy=wuxjm42nvjmsetwx463ht+b5rljwc2kp0r00rdity...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Dear all, > > I am having issues with the BioJava UniProt parser as detailed below: > > Code: > > BufferedReader br = new BufferedReader(new FileReader( files[index])); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator iterator = RichSequence.IOTools.readUniProt(br, ns); > while(iterator.hasNext()) > { > try > { > RichSequence rs=iterator.nextRichSequence(); > } > > catch (NoSuchElementException e) > { > > } > catch (BioException e) > { > e.printStackTrace(); > } > > > > > The file I am using is downloaded from the link: > > ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_fungi.dat.gz > > > The problem is that the parser works for a subset of the IDs within the file > and on others throws an exception. > > Sample Exception stack trace: > > *** Start of trace ************************* > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at uniprot.mp.main(mp.java:161) > Caused by: org.biojava.bio.seq.io.ParseException: > > A Exception Has Occurred During Parsing. > Please submit the details that follow to [email protected] or post a bug > report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.UniProtFormat > Accession=P53031 > Id= > Comments= > Parse_block=RN [1]RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].RC STRAIN=NCYC > 2512;RX MEDLINE=97082501; PubMed=8923737; > DOI=10.1002/(SICI)1097-0061(199610)12:13<1321::AID-YEA27>3.0.CO;2-6;RA > Rodriguez P.L., Ali R., Serrano R.;RT "CtCdc55p and CtHa13p: two putative > regulatory proteins from Candida > tropicalis with long acidic domains.";RL Yeast 12:1321-1329(1996). > Stack trace follows .... > > > at > org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:615) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 1 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:486) > ... 2 more > org.biojava.bio.BioException: Could not read sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at uniprot.mp.main(mp.java:161) > Caused by: org.biojava.bio.seq.io.ParseException: Name has not been supplied > > ********End of trace********************************** > > An example of an Id that worked is: > > ZYM1_SCHPO > > while an ID that didn't work is: > > ZUO1_YEAST > > Thanks a lot in advance. > > Cheers, > Saif > > > -- > Saif Ur-Rehman > > Centre for Evolution, Genes and Genomics > Harold Mitchell Building > University of St Andrews > St Andrews > Fife > KY16 9TH > UK > > Tel: +44 131 5572556 > Fax: +44 1334 463366 > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 104, Issue 6 > ***************************************** _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
