On 03/12/2022 07:21, Bert Gunter wrote:
Perhaps it is worth pointing out that looping constructs like lapply() can
be avoided and the procedure vectorized by mimicking Martin Morgan's
solution:
## s is the string to be searched.
diff(c(0,grep('b',strsplit(s,'')[[1]])))
However, Martin's solution is simpler and likely even faster as the regex
engine is unneeded:
diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely vectorized
This seems much preferable to me.
Of all the proposed solutions, Andrew Hart's solution seems the most
efficient:
big_string <- strrep("abaaabbaaaaabaaabaaaaaaaaaaaaaaaaaaab", 500000)
system.time(nchar(strsplit(big_string, split="b", fixed=TRUE)[[1]]) + 1)
# user system elapsed
# 0.736 0.028 0.764
system.time(diff(c(0, which(strsplit(big_string, "", fixed=TRUE)[[1]]
== "b"))))
# user system elapsed
# 2.100 0.356 2.455
The bigger the string, the bigger the gap in performance.
Also, the bigger the average gap between 2 successive b's, the bigger
the gap in performance.
Finally: always use fixed=TRUE in strsplit() if you don't need to use
the regex engine.
Cheers,
H.
-- Bert
On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarra...@sapo.pt> wrote:
Às 17:18 de 02/12/2022, Evan Cooch escreveu:
Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string
abaaabbaaaaabaaab
Its easy enough to count the number of times the character 'b' shows up
in the string, but...what I'm looking for is outputing the 'intervals'
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions
2, 6, 7, 13, 17
So, the interval data would be: 2, 4, 1, 6, 4
My main approach has been to simply output positions (say, something
like unlist(gregexpr('b', target_string))), and 'do the math' between
successive positions. Can anyone suggest a more elegant approach?
Thanks in advance...
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,
I don't find your solution inelegant, it's even easy to write it as a
one-line function.
char_interval <- function(x, s) {
lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
}
target_string <-"abaaabbaaaaabaaab"
char_interval('b', target_string)
#> [[1]]
#> [1] 2 4 1 6 4
Hope this helps,
Rui Barradas
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Hervé Pagès
Bioconductor Core Team
hpages.on.git...@gmail.com
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.