You are right again my friend.Definitely that would hang up my machine with the xml file parsing activity.
This is about sequence alignment and related modules. I will look at this today and send a fix on that.Hope that you can help. PS: what about pattern matching in sequences?interesting to have in biojava 3 ? Regards, JD On 10/29/10, Andy Yates <[email protected]> wrote: > Okay couple of points here: > > 1). Which biojava3 module? This sounds like something for the genomic module > rather than core > > 2). It'll need some more work. I'm not happy about using the > WindowedSequenceView in its current state. I think an alteration to avoid it > making Lists would be a good idea (plus recent developments in the API as to > its main use means this is a viable change). Also it should return the > overlapping ones in base order i.e. 1->3, 2->4 not 1->3, 4->6 > > Comments? > > Andy > > On 29 Oct 2010, at 10:12, jitesh dundas wrote: > >> Dear Friends, >> >> Thanks to Vishal & Andy for this. I actually needed this code too.. >> Vishal, I think Andy's suggestions may be a good option to include in >> BioJava 3. Would you like to add this to the BioJava 3. >> >> Thanks again. >> >> Regards, >> Jitesh Dundas >> >> On 10/29/10, Andy Yates <[email protected]> wrote: >>> Hi Vishal, >>> >>> As far as I am aware there is nothing which will generate them in BioJava >>> at >>> the moment. However it is possible to do it with BioJava3: >>> >>> public static void main(String[] args) { >>> DNASequence d = new DNASequence("ATGATC"); >>> System.out.println("Non-Overlap"); >>> nonOverlap(d); >>> System.out.println("Overlap"); >>> overlap(d); >>> } >>> >>> public static final int KMER = 3; >>> >>> //Generate triplets overlapping >>> public static void overlap(Sequence<NucleotideCompound> d) { >>> List<WindowedSequence<NucleotideCompound>> l = >>> new ArrayList<WindowedSequence<NucleotideCompound>>(); >>> for(int i=1; i<=KMER; i++) { >>> SequenceView<NucleotideCompound> sub = d.getSubSequence( >>> i, d.getLength()); >>> WindowedSequence<NucleotideCompound> w = >>> new WindowedSequence<NucleotideCompound>(sub, KMER); >>> l.add(w); >>> } >>> >>> //Will return ATG, ATC, TGA & GAT >>> for(WindowedSequence<NucleotideCompound> w: l) { >>> for(List<NucleotideCompound> subList: w) { >>> System.out.println(subList); >>> } >>> } >>> } >>> >>> //Generate triplet Compound lists non-overlapping >>> public static void nonOverlap(Sequence<NucleotideCompound> d) { >>> WindowedSequence<NucleotideCompound> w = >>> new WindowedSequence<NucleotideCompound>(d, KMER); >>> //Will return ATG & ATC >>> for(List<NucleotideCompound> subList: w) { >>> System.out.println(subList); >>> } >>> } >>> >>> The disadvantage of all of these solutions is that they generate lists of >>> Compounds so kmer generation can/will be a memory intensive operation. >>> This >>> does mean it has to be since sub sequences are thin wrappers around an >>> underlying sequence. Also the overlap solution is non-optimal since it >>> iterates through each window rather than stepping through delegating onto >>> each base in turn (hence why we get ATG & ATC before TGA) >>> >>> As for unique k-mers that's something which would require a bit more >>> engineering & would be better suited to a solution built around a Trie >>> (prefix tree). >>> >>> Hope this helps, >>> >>> Andy >>> >>> On 28 Oct 2010, at 18:40, Vishal Thapar wrote: >>> >>>> Hi All, >>>> >>>> I had a quick question: Does Biojava have a method to generate k-mers or >>>> K-mer counting in a given Fasta Sequence / File? Basically, I want k-mer >>>> counts for every sequence in a fasta file. If something like this exists >>>> it >>>> would save me some time to write the code. >>>> >>>> Thanks, >>>> >>>> Vishal >>>> _______________________________________________ >>>> Biojava-l mailing list - [email protected] >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - [email protected] >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
