On Fri, Apr 7, 2017 at 1:13 AM, Hervé Pagès <hpa...@fredhutch.org> wrote: > > This is the expected behavior. > > Some background: BSgenomeViews are list-like objects where the *list > elements* (i.e. the elements one extracts with [[) are the DNA > sequences from the views --snip-- > The important difference is that with [[ I get a DNAString object > (the content of the view) and with [ I get a BSgenomeViews object > of length 1.
Thank you, Hervé! I was failing to make the connection with the `[[` accessor. On Fri, Apr 7, 2017 at 1:16 AM, Michael Lawrence <lawrence.mich...@gene.com> wrote: > > I'm curious as to why you are looping over the views in the first > place. Maybe we could arrive at a vectorized solution, which is often > but not always simpler and faster. Hi Michael! Broad background is I'm acculturating an undergraduate student to writing a bioconductor package and applying software engineering practices of version control, unit testing, documenting, dependency setup and validation in a different environment on our university HPC cluster, etc. The student also came along to LibrePlanet to better understand the culture of software freedom :o) The package goal is to use Biostrings to look for repeating DNA sequences of a fixed kmer size and subset to portions of the genome without repeats (an aligner can do this ofc, but the goal is to teach R and engineering practices). I appreciate your thoughtfulness for vectorizing the code to best use BSgenomeViews, but please don't spend more than 10 minutes as I have to balance changes to the code with the student's learning and coding "voice" and may not do proper justice for more of your effort. My slowness to reply was getting the project further along to be more understandable. Here was the line which I've updating as Hervé suggested to use seq_along(): https://github.com/coregenomics/kmap/blob/4adaed6b8007e8ea39f39ff57a42a821445d3d46/R/BiostringsProjectNEW.R#L185 (I'm having a hard time thinking of how to summarizing a small example out of context). Although in that line ranges_hits() is only operating on single indices, ranges_hits() was written to process groups of indices to reduce multi-processor communication. Generating such sets of indices would involve applying width() to the views inside mappable() to break in into chunks of, say, a million bases for matchPDict(). Again, I'm linking to the code for anything that stands out at you, but I will feel bad if you spend a lot of time on it. > H. > Michael Pariksheet [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel