The code is as follows: Actually the code is taken from BioJavaCookbook with a 
little modification. The following method is called from another class. The 
method takes the names of the files or simply say files as an argument in the 
form of list.

public void MSAFromFiles(List<String> ids) throws Exception{
        List<ProteinSequence> lst = new ArrayList<ProteinSequence>();
         ProteinSequence pSeq=null;
        for (String id : ids) {
            pSeq=getSequenceFromFiles(id);
            lst.add(pSeq);
            //System.out.println("seq==" +pSeq);
        }
        profile = Alignments.getMultipleSequenceAlignment(lst);
    }

getSequenceFromFiles() method is given below

private ProteinSequence getSequenceFromFiles(String inputFile) throws Exception{
        ProteinSequence seq=null;
        //System.out.println("inputFile==="+inputFile);
         FileInputStream is = new FileInputStream(inputFile);

            FastaReader<ProteinSequence, AminoAcidCompound> fastaReader = new 
FastaReader<ProteinSequence, AminoAcidCompound>(is, new 
GenericFastaHeaderParser<ProteinSequence,AminoAcidCompound>(), new 
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));
            LinkedHashMap<String,ProteinSequence> proteinSequences = 
fastaReader.process();
            is.close();
             //System.out.println( "proteinSequences=" + proteinSequences );
            //LinkedHashMap<String, ProteinSequence> a = 
FastaReaderHelper.readFastaProteinSequence(new File(fileName));
            for (  Entry<String, ProteinSequence> entry : 
proteinSequences.entrySet() ) {
                seq= new 
ProteinSequence(entry.getValue().getSequenceAsString());
                seq.setAccession(entry.getValue().getAccession());
                //System.out.println( "Inside getSequenceFromFile=" + seq );
            //FastaReaderHelper.readFastaDNASequence for DNA sequences
            }
       return seq;
           
    }
After getting the Profile object I wrote the following code to display the No. 
of gaps 

List<AlignedSequence<ProteinSequence,AminoAcidCompound>> 
listOfalSeq=profile.getAlignedSequences();
      
        AlignedSequence<ProteinSequence,AminoAcidCompound> alSeq;
        int noOfcompounds=0;
        int numOfGaps=0;
        StringBuilder html= new StringBuilder("<html><body><table 
border=1><tr><td>Accession Id</td><td>Number of gaps</td></tr>");
        for (int i=0; i<listOfalSeq.size(); i++){
      
            alSeq=listOfalSeq.get(i);
            accessionId=alSeq.getAccession().getID();
            noOfcompounds=alSeq.countCompounds();
            numOfGaps=alSeq.getNumGaps();
            html.append("<tr><td>"); 
            html.append(accessionId);
            html.append("</td><td>"); 
            html.append(numOfGaps); 
            html.append("</td></tr>"); 
            //System.out.println("accessionId==" +accessionId);
            //pSeq=new 
ProteinSequence(seq.getSequenceAsString(),seq.getCompoundSet());
            //pSeq.setAccession(seq.getAccession());
            //multipleSequenceAlignment.addAlignedSequence(pSeq);
               
        }
        html.append("</table></body></html>"); 
        setText(html.toString());

setText() method is the method of JEditorPane or JTextPane

Tariq, Phd Scholar

> From: [email protected]
> Subject: Biojava-l Digest, Vol 102, Issue 4
> To: [email protected]
> Date: Thu, 7 Jul 2011 12:00:04 -0400
> 
> Send Biojava-l mailing list submissions to
>       [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
>       [email protected]
> 
> You can reach the person managing the list at
>       [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
> 
> 
> Today's Topics:
> 
>    1. BioJava Gene Hierarchies (Daniel Di Giulio)
>    2. Page creation in Biojava (Muhammad Tariq Pervez)
>    3. Re: No. of gaps in aligned sequences (Andreas Prlic)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 6 Jul 2011 12:07:07 -0400
> From: Daniel Di Giulio <[email protected]>
> Subject: [Biojava-l] BioJava Gene Hierarchies
> To: [email protected]
> Message-ID:
>       <CAEb=yspftxrqtgyqfuebsgpjnhqtnorp32wafu91seso-zr...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hello,
> 
> I'm currently using BioJava to upgrade a eukaryotic gene finder program
> (EVIGAN) to be compatible with the GFF3 formats.  Your BioJava genome
> package is very useful, but I had a question about implementing a sort of
> gene hierarchy from parsed files.  Essentially, I would like to be able to
> read in a GFF3 file of a region of interest, parse out the CDS segments, and
> then create a hierarchy of genes from the attribute tags, which I can then
> employ later in my program.  It seems as if the
> org.biojava3.genome.parsers.gff class is good for this, but there doesn't
> seem to be a data structure for organizing related "Feature" objects into a
> higher grouping based on similar attributes.  Does anyone know of a way to
> implement this, or a package within BioJava which could be useful?
> 
> Thanks a lot,
> Daniel
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 7 Jul 2011 08:48:08 +0000
> From: Muhammad Tariq Pervez <[email protected]>
> Subject: [Biojava-l] Page creation in Biojava
> To: <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> 
> Hi, All
> I want to contribute in BioJava CookBook. I have already login id/account. I 
> can create internal/external links. But further what to do. How can i create 
> a page and link to the internal/external link.
> 
> Regards.
>  
> 
> Tariq, Phd Scholar
>                                         
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 7 Jul 2011 08:10:53 -0700
> From: Andreas Prlic <[email protected]>
> Subject: Re: [Biojava-l] No. of gaps in aligned sequences
> To: Muhammad Tariq Pervez <[email protected]>
> Cc: [email protected], [email protected]
> Message-ID:
>       <CALthepw15crxKRkk5sYOdRruMCt_3xBrTdNa4=w7icvy6kk...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi Tariq,
> 
> Can you send us the sample code / DB accession IDs so we can try to
> reproduce this?
> 
> Andreas
> 
> On Wed, Jul 6, 2011 at 4:37 AM, Muhammad Tariq Pervez
> <[email protected]> wrote:
> >
> >
> > Hi, Dear all,
> > I am working on the development of MSA application using BioJava. I want to 
> > make clear a thing. It is that when two or more protein sequences are 
> > aligned the '-' is shown more times in an aligned sequence than the gaps 
> > display by the method of alSeq.getNumGaps(). 'alSeq' is an aligned 
> > sequence. For example, if there are actual 50 '-' in an aligned sequence 
> > but the method shows it only 30. What is the difference between these two 
> > results.
> >
> > Best Regards
> >
> >
> > Tariq, Phd Scholar
> >
> > _______________________________________________
> > Biojava-l mailing list ?- [email protected]
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> End of Biojava-l Digest, Vol 102, Issue 4
> *****************************************
                                          
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to