On 03/12/2022 07:21, Bert Gunter wrote:
Perhaps it is worth pointing out that looping constructs like lapply() can
be avoided and the procedure vectorized by mimicking Martin Morgan's
solution:

## s is the string to be searched.
diff(c(0,grep('b',strsplit(s,'')[[1]])))

However, Martin's solution is simpler and likely even faster as the regex
engine is unneeded:

diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely vectorized

This seems much preferable to me.

Of all the proposed solutions, Andrew Hart's solution seems the most efficient:

  big_string <- strrep("abaaabbaaaaabaaabaaaaaaaaaaaaaaaaaaab", 500000)

  system.time(nchar(strsplit(big_string, split="b", fixed=TRUE)[[1]]) + 1)
  #    user  system elapsed
  #   0.736   0.028   0.764

  system.time(diff(c(0, which(strsplit(big_string, "", fixed=TRUE)[[1]] == "b"))))
  #    user  system elapsed
  #  2.100   0.356   2.455

The bigger the string, the bigger the gap in performance.

Also, the bigger the average gap between 2 successive b's, the bigger the gap in performance.

Finally: always use fixed=TRUE in strsplit() if you don't need to use the regex engine.

Cheers,

H.


-- Bert





On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarra...@sapo.pt> wrote:

Às 17:18 de 02/12/2022, Evan Cooch escreveu:
Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbaaaaabaaab

Its easy enough to count the number of times the character 'b' shows up
in the string, but...what I'm looking for is outputing the 'intervals'
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something
like unlist(gregexpr('b', target_string))), and 'do the math' between
successive positions. Can anyone suggest a more elegant approach?

Thanks in advance...

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

I don't find your solution inelegant, it's even easy to write it as a
one-line function.


char_interval <- function(x, s) {
    lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
}

target_string <-"abaaabbaaaaabaaab"
char_interval('b', target_string)
#> [[1]]
#> [1] 2 4 1 6 4


Hope this helps,

Rui Barradas

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to