Hi Sarah,
just a few comments / guesses: maybe there are other approaches, but you could follow the FASTA-format BioJavaX understands, e.g.: |>gi|<identifier>|<namespace>|<accession>|<name> <description>| http://biojava.org/wiki/BioJava:BioJavaXDocs#Reading (chapter 8.2.1) So, if your file would look like this: >gi|0|namespace|null|1 0.9992 ASITENGGAEEESVAK >gi|1|namespace|null|1 0.9953 ASITENGGAEEESVAK . . . you could use: System.out.println("id: "+rich_seq.getIdentifier()); System.out.println("rank: "+rich_seq.getName()); System.out.println("probability: "+rich_seq.getDescription()); System.out.println("sequence: "+rich_seq.seqString()); to get the data. However, 'rank' or 'probability' actually would be annotations of the sequence, so when processing the data (e.g. storing in a database), one would store these data as annotations. - As for Java-style / naming conventions for variables, the 'Camel-Case' is recommended, e.g.richSeq instead of rich_seq. - To get the alphabet: System.out.println("alphabet: "+rich_seq.getAlphabet().getName()); - Also maybe you should use the default namespace instead of null: RichSequenceIterator rich_stream = RichSequence.IOTools.readFastaProtein(br,RichObjectFactory.getDefaultNamespace()); Cheers, Felix Gerster Sarah wrote: > Hi! > > I'm trying to read peptides from a fasta file: > >> id|0|0.9992|1 >> > ASITENGGAEEESVAK > >> id|1|0.9953|1 >> > ASITENGGAEEESVAK > >> id|2|0.9998|1 >> > ASNASSAGDEVDNVATSSK > >> id|3|0.9998|1 >> > EAAAAEEPQPSDEGDVVAK > >> id|4|0.9998|1 >> > EAAAAEEPQPSDEGDVVAK > .... > I would like to have all peptides somewhere in the memory. I need, their id, > the sequence and the 2 numbers at the end (e.g. id = 0, probability = 0.9992, > rank = 1 for the first entry in the file). > > I tried to use readFastaProtein... but I guess I don't use it right. Anyway, > I get the sequences, but I don't get any of the other infomations I want... > > Here is my code: > try > { > BufferedReader br = new BufferedReader(new FileReader(file_name)); > RichSequenceIterator rich_stream = > RichSequence.IOTools.readFastaProtein(br,null); > while(rich_stream.hasNext()) > { > RichSequence rich_seq = rich_stream.nextRichSequence(); > System.out.println(rich_seq.toString()); > System.out.println(rich_seq.getAccession()); > System.out.println(rich_seq.getAlphabet()); > System.out.println(rich_seq.getAnnotation()); > System.out.println(rich_seq.getName()); > System.out.println(rich_seq.getDescription()); > System.out.println(rich_seq.getIdentifier()); > System.out.println(rich_seq.seqString()); > } > } > catch(Exception e) > { > System.err.println("Bug while reading the sequences from the FASTA file"); > } > > > Here's the output (for the first entry in the fasta file): > id|0:1/0.9992 > 0 > [EMAIL PROTECTED] > > 1 > null > null > ASITENGGAEEESVAK > > > Can anyone tell me what's going wrong? > Is there already a function to put all the sequences directly in the memory > (like a HashSet) while reading them? > > Cheers > > Sarah > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- Felix Dreher Max Planck Institute for Molecular Genetics Department of Vertebrate Genomics Bioinformatics Group Ihnestr. 73 D-14195 Berlin Phone: +49 30 - 8413 1682 Mobile: +49 163 - 754 24 26 E-mail: [EMAIL PROTECTED] www.molgen.mpg.de/~lh_bioinf _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
