I should know better, but it's been a while since I've used biojava...

I need to parse an EMBL file, pulling out all the FASTA for organisms of interest.

in bioperl I would do something like:

$in = Bio::SeqIO->new(-file => "$file" , '-format' => 'EMBL');
$out = Bio::SeqIO->new(-file=> ">$file.out", '-format' => 'Fasta');
while ( $seq = $in->next_seq() ) {
$acc = $seq->accession_number;
$species = $seq->species();
$species = $species->binomial();
//there exists %myfavoritespecies() somewhere...
if (exists($myfavoritespecies{$species})) {
$seq->display_id($acc);
$out->write_seq($seq);
}
}

in biojava it's something like: (TOTALLY stolen from bioconf.otago.ac.nz)

public class TestEMBLParsing {
public static void main (String[] args) {
BufferedReader br = null;

try {
br = new BufferedReader(new FileReader(args[0]));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// read the embl file
SequenceIterator sequences = SeqIOTools.readEmbl(br);
while (sequences.hasNext()) {
try {
Sequence seq = sequences.nextSequence();
String accession = seq.getName();
String fasta = seq.seqString();
// how do I check to see if its my species of interest?
// how do I create a FASTA output stream?
System.out.println(accession); // for testing
System.out.println(fasta); // for testing
} catch (NoSuchElementException e) {
e.printStackTrace();
} catch (BioException e) {
e.printStackTrace();
}
}
}
}


Thanks so much, sorry if this has been posted somewhere... just couldn't find it looking around the website...

-Andreas



--------------
Andreas Matern
Bioinformatician
Bioinformatics - Research and Development
Lion Bioscience Research Inc.
141 Portland Street, 10th floor
Cambridge, MA 02139 USA
Phone: 617-245-5483
Fax: 617-245-5499
[EMAIL PROTECTED]
www.lionbioscience.com

_______________________________________________
Biojava-l mailing list - [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to