The code is as follows: Actually the code is taken from BioJavaCookbook 
with a little modification. The following method is called from another 
class. The method takes the names of the files or simply say files as an
 argument in the form of list.



public void MSAFromFiles(List<String> ids) throws Exception{

        List<ProteinSequence> lst = new ArrayList<ProteinSequence>();

         ProteinSequence pSeq=null;

        for (String id : ids) {

            pSeq=getSequenceFromFiles(id);

            lst.add(pSeq);

            //System.out.println("seq==" +pSeq);

        }

        profile = Alignments.getMultipleSequenceAlignment(lst);

    }


getSequenceFromFiles() method is given below



private ProteinSequence getSequenceFromFiles(String inputFile) throws Exception{

        ProteinSequence seq=null;

        //System.out.println("inputFile==="+inputFile);

         FileInputStream is = new FileInputStream(inputFile);



            FastaReader<ProteinSequence, AminoAcidCompound> 
fastaReader = new FastaReader<ProteinSequence, 
AminoAcidCompound>(is, new 
GenericFastaHeaderParser<ProteinSequence,AminoAcidCompound>(), new
 
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));

            LinkedHashMap<String,ProteinSequence> proteinSequences = 
fastaReader.process();

            is.close();

             //System.out.println( "proteinSequences=" + proteinSequences );

            //LinkedHashMap<String, ProteinSequence> a = 
FastaReaderHelper.readFastaProteinSequence(new File(fileName));

            for (  Entry<String, ProteinSequence> entry : 
proteinSequences.entrySet() ) {

                seq= new 
ProteinSequence(entry.getValue().getSequenceAsString());

                seq.setAccession(entry.getValue().getAccession());

                //System.out.println( "Inside getSequenceFromFile=" + seq );

            //FastaReaderHelper.readFastaDNASequence for DNA sequences

            }

       return seq;

           

    }

After getting the Profile object I wrote the following code to display the No. 
of gaps 



List<AlignedSequence<ProteinSequence,AminoAcidCompound>> 
listOfalSeq=profile.getAlignedSequences();

      

        AlignedSequence<ProteinSequence,AminoAcidCompound> alSeq;

        int noOfcompounds=0;

        int numOfGaps=0;

        StringBuilder html= new 
StringBuilder("<html><body><table 
border=1><tr><td>Accession Id</td><td>Number 
of gaps</td></tr>");

        for (int i=0; i<listOfalSeq.size(); i++){

      

            alSeq=listOfalSeq.get(i);

            accessionId=alSeq.getAccession().getID();

            noOfcompounds=alSeq.countCompounds();

            numOfGaps=alSeq.getNumGaps();

            html.append("<tr><td>"); 

            html.append(accessionId);

            html.append("</td><td>"); 

            html.append(numOfGaps); 

            html.append("</td></tr>"); 

            //System.out.println("accessionId==" +accessionId);

            //pSeq=new 
ProteinSequence(seq.getSequenceAsString(),seq.getCompoundSet());

            //pSeq.setAccession(seq.getAccession());

            //multipleSequenceAlignment.addAlignedSequence(pSeq);

               

        }

        html.append("</table></body></html>"); 

        setText(html.toString());



setText() method is the method of JEditorPane or JTextPane


Tariq, Phd Scholar

Muhammad Tariq Pervez

Assistant Professor,
Department of Computer Science
Virtual University of Pakistan, Lahore
Tel: (042) 9203114-7  
URL: www.vu.edu.pk
Mobile: +923364120541, +923214602694


> Date: Thu, 7 Jul 2011 08:10:53 -0700
> Subject: Re: [Biojava-l] No. of gaps in aligned sequences
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
> 
> Hi Tariq,
> 
> Can you send us the sample code / DB accession IDs so we can try to
> reproduce this?
> 
> Andreas
> 
> On Wed, Jul 6, 2011 at 4:37 AM, Muhammad Tariq Pervez
> <[email protected]> wrote:
> >
> >
> > Hi, Dear all,
> > I am working on the development of MSA application using BioJava. I want to 
> > make clear a thing. It is that when two or more protein sequences are 
> > aligned the '-' is shown more times in an aligned sequence than the gaps 
> > display by the method of alSeq.getNumGaps(). 'alSeq' is an aligned 
> > sequence. For example, if there are actual 50 '-' in an aligned sequence 
> > but the method shows it only 30. What is the difference between these two 
> > results.
> >
> > Best Regards
> >
> >
> > Tariq, Phd Scholar
> >
> > _______________________________________________
> > Biojava-l mailing list  -  [email protected]
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
                                          
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to