This piece of code should do it. The nrCGRandom changes with the windowlength though (I calculated this value from EPD a long time ago). I use org.apache.oro.text.regex.*;
Bye, Stein.
=============================================
public void annotateCpG(Sequence seq) throws Exception{
double nrCGRandom = 14.1048d;
int windowLength = 200;
int step=100;
String motif = "cg";
//
PatternMatcher matcher=new Perl5Matcher();
PatternCompiler compiler = new Perl5Compiler();
Pattern pattern = null;
PatternMatcherInput input;
MatchResult result;
pattern = compiler.compile(motif);
int nrOfOcc=0;
int nrConsec=0;
double gcContent = 0d;
double cgValue=0d;
int index = 1;
String dnaStr;
SymbolList window= null;
int i=1;
while (index<seq.length()-windowLength){
window = seq.subList(index,index+windowLength);
dnaStr = window.seqString();
input = new PatternMatcherInput(dnaStr);
while(matcher.contains(input, pattern)) {
nrOfOcc++;
}
gcContent = GCContent(window);
cgValue = (nrCgRandom/nrOfOcc);
if (cgValue<0.6d && gcContent>50.0d){
nrConsec++;
}
else if (nrConsec>0){
int start = (index-(nrConsec)*step);
int stop = index;
nrConsec=0;
Feature.Template template = new Feature.Template();
template.type = "misc_feature";
template.location = new RangeLocation(start,stop);
template.annotation = new SimpleAnnotation();
template.annotation.setProperty("type","CpG Island");
seq.createFeature(template);
}
nrOfOcc=0;
index=index+step;
}
}public double GCContent(SymbolList seq){
int gc = 0;
for (int pos = 1; pos <= seq.length(); ++pos) {
org.biojava.bio.symbol.Symbol sym = seq.symbolAt(pos);
if (sym == org.biojava.bio.seq.DNATools.g() || sym == org.biojava.bio.seq.DNATools.c())
++gc;
}
double content = (gc*100)/seq.length();
return content;
}
Sylvain Foisy wrote:
Hi,
Sorry for the hand holding question. I am trying to wrap the output of a Perl script that finds CpG islands (cpgi130.pl) using Runtime.exec with little success. Has anyone ever built a class that can find CpG islands in Sequence objects and build features with RangeLocations?
Other solution: has anyone ever try to write a wrapper that would use the output of any program of the EMBOSS suite? It has some CpG island finding software.
Anyhelp would be much appreciated
P.S.: there has been a call for an algorithm repository on Usenet. Would it be possible/interesting to create such a repository for classes built with BioJava that could be of interest to other rookies like me.
Just an idea...
Sylvain
=================================================================== Sylvain Foisy, Ph. D. Directeur - operations / Project Manager BioneQ - Reseau quebecois de bio-informatique U. de Montreal / Genome-Quebec
Addresse postale:
Departement de biochimie Pavillon principal 2900, boul. �douard-Montpetit Montr�al (Qu�bec) H3T 1J4
Tel: (514) 343-6111 x.4066 Courriel: [EMAIL PROTECTED] ===================================================================
_______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
-- Stein Aerts [EMAIL PROTECTED] K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI
_______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
