Dear Sir, Is there any way to detect patterns in the recorded k-mers .
I have a large set of miRNAs (study for mutations and patgerns for gastric cancer).I made a record of k-mers for each sequence but the patterns that are generated are difficult to track. Can BioJava do this point. Regular Expressions in Java maybe useful here.. Request expert advise in this.Any other s/w that might be useful. Thanks, Jitesh Dundas On 10/29/10, jitesh dundas <[email protected]> wrote: > Dear Friends, > > Thanks to Vishal & Andy for this. I actually needed this code too.. > Vishal, I think Andy's suggestions may be a good option to include in > BioJava 3. Would you like to add this to the BioJava 3. > > Thanks again. > > Regards, > Jitesh Dundas > > On 10/29/10, Andy Yates <[email protected]> wrote: >> Hi Vishal, >> >> As far as I am aware there is nothing which will generate them in BioJava >> at >> the moment. However it is possible to do it with BioJava3: >> >> public static void main(String[] args) { >> DNASequence d = new DNASequence("ATGATC"); >> System.out.println("Non-Overlap"); >> nonOverlap(d); >> System.out.println("Overlap"); >> overlap(d); >> } >> >> public static final int KMER = 3; >> >> //Generate triplets overlapping >> public static void overlap(Sequence<NucleotideCompound> d) { >> List<WindowedSequence<NucleotideCompound>> l = >> new ArrayList<WindowedSequence<NucleotideCompound>>(); >> for(int i=1; i<=KMER; i++) { >> SequenceView<NucleotideCompound> sub = d.getSubSequence( >> i, d.getLength()); >> WindowedSequence<NucleotideCompound> w = >> new WindowedSequence<NucleotideCompound>(sub, KMER); >> l.add(w); >> } >> >> //Will return ATG, ATC, TGA & GAT >> for(WindowedSequence<NucleotideCompound> w: l) { >> for(List<NucleotideCompound> subList: w) { >> System.out.println(subList); >> } >> } >> } >> >> //Generate triplet Compound lists non-overlapping >> public static void nonOverlap(Sequence<NucleotideCompound> d) { >> WindowedSequence<NucleotideCompound> w = >> new WindowedSequence<NucleotideCompound>(d, KMER); >> //Will return ATG & ATC >> for(List<NucleotideCompound> subList: w) { >> System.out.println(subList); >> } >> } >> >> The disadvantage of all of these solutions is that they generate lists of >> Compounds so kmer generation can/will be a memory intensive operation. >> This >> does mean it has to be since sub sequences are thin wrappers around an >> underlying sequence. Also the overlap solution is non-optimal since it >> iterates through each window rather than stepping through delegating onto >> each base in turn (hence why we get ATG & ATC before TGA) >> >> As for unique k-mers that's something which would require a bit more >> engineering & would be better suited to a solution built around a Trie >> (prefix tree). >> >> Hope this helps, >> >> Andy >> >> On 28 Oct 2010, at 18:40, Vishal Thapar wrote: >> >>> Hi All, >>> >>> I had a quick question: Does Biojava have a method to generate k-mers or >>> K-mer counting in a given Fasta Sequence / File? Basically, I want k-mer >>> counts for every sequence in a fasta file. If something like this exists >>> it >>> would save me some time to write the code. >>> >>> Thanks, >>> >>> Vishal >>> _______________________________________________ >>> Biojava-l mailing list - [email protected] >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
