This piece of code should do it. The nrCGRandom changes with the windowlength though (I calculated this value from EPD a long time ago). I use org.apache.oro.text.regex.*;
Bye, Stein.
=============================================
public void annotateCpG(Sequence seq) throws Exception{ double nrCGRandom = 14.1048d; int windowLength = 200; int step=100; String motif = "cg"; // PatternMatcher matcher=new Perl5Matcher(); PatternCompiler compiler = new Perl5Compiler(); Pattern pattern = null; PatternMatcherInput input; MatchResult result; pattern = compiler.compile(motif); int nrOfOcc=0; int nrConsec=0; double gcContent = 0d; double cgValue=0d; int index = 1; String dnaStr; SymbolList window= null; int i=1; while (index<seq.length()-windowLength){ window = seq.subList(index,index+windowLength); dnaStr = window.seqString(); input = new PatternMatcherInput(dnaStr); while(matcher.contains(input, pattern)) { nrOfOcc++; } gcContent = GCContent(window); cgValue = (nrCgRandom/nrOfOcc); if (cgValue<0.6d && gcContent>50.0d){ nrConsec++; } else if (nrConsec>0){ int start = (index-(nrConsec)*step); int stop = index; nrConsec=0; Feature.Template template = new Feature.Template(); template.type = "misc_feature"; template.location = new RangeLocation(start,stop); template.annotation = new SimpleAnnotation(); template.annotation.setProperty("type","CpG Island"); seq.createFeature(template); } nrOfOcc=0; index=index+step; } }
public double GCContent(SymbolList seq){
int gc = 0;
for (int pos = 1; pos <= seq.length(); ++pos) {
org.biojava.bio.symbol.Symbol sym = seq.symbolAt(pos);
if (sym == org.biojava.bio.seq.DNATools.g() || sym == org.biojava.bio.seq.DNATools.c())
++gc;
}
double content = (gc*100)/seq.length();
return content;
}
Sylvain Foisy wrote:
Hi,
Sorry for the hand holding question. I am trying to wrap the output of a Perl script that finds CpG islands (cpgi130.pl) using Runtime.exec with little success. Has anyone ever built a class that can find CpG islands in Sequence objects and build features with RangeLocations?
Other solution: has anyone ever try to write a wrapper that would use the output of any program of the EMBOSS suite? It has some CpG island finding software.
Anyhelp would be much appreciated
P.S.: there has been a call for an algorithm repository on Usenet. Would it be possible/interesting to create such a repository for classes built with BioJava that could be of interest to other rookies like me.
Just an idea...
Sylvain
=================================================================== Sylvain Foisy, Ph. D. Directeur - operations / Project Manager BioneQ - Reseau quebecois de bio-informatique U. de Montreal / Genome-Quebec
Addresse postale:
Departement de biochimie Pavillon principal 2900, boul. Édouard-Montpetit Montréal (Québec) H3T 1J4
Tel: (514) 343-6111 x.4066 Courriel: [EMAIL PROTECTED] ===================================================================
_______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
-- Stein Aerts [EMAIL PROTECTED] K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI
_______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l