Wacek Kusnierczyk wrote: > Gabor Grothendieck wrote: > >> On Sat, Jan 31, 2009 at 4:46 PM, Wacek Kusnierczyk >> <waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: >> >> >>> >>> to extend the context, if you were to solve the problem in perl, the >>> regex below would work in perl 5.10, but not in earlier versions of >>> perl; another approach is to replace the unwanted leading characters >>> with equally many replacement characters at once. >>> >>> $string = 'aabaab'; >>> >>> # perl 5.10 >>> $string =~ s/a|(*COMMIT)(*FAIL)/c/g >>> # $string is 'ccbaab' >>> >>> # any recent perl >>> $string =~ s/^a*/'c' x length $&/e; >>> # $string is 'ccbaab' >>> >>> i don't know how (if) the latter could be done in r. >>> >>> >> This seems quite analogous: >> >> library(gsubfn) >> s <- "aabaab" >> gsubfn("^a*", ~ paste(rep("c", nchar(x)), collapse = ""), s)[[1]] >> >> > > indeed, as does the following variant: > > gsubfn("^a*", ~ gsub(".", "c", x), s)[[1]] > >
just for the record, the two gsubfn-based versions run substantially slower than the gsub-based one; with 1000 strings of 100 random letters each, the difference is 2 orders of magnitude (see the attached naive test). i guess much of it is due to r-based implementation of gsubfn, and when you have it in c the difference will reduce dramatically. vQ
#!/usr/bin/r n.strings = 1000 n.letters = 100 n.repetitions = 100 strings = replicate(n.strings, paste(sample(letters, n.letters, replace=TRUE), collapse="")) library(gsubfn) results = list( system.time(replicate(n.repetitions, gsub('a|(*COMMIT)(*FAIL)', '-', strings, perl=TRUE))), system.time(replicate(n.repetitions, gsubfn('^a*', ~ paste(rep('-', nchar(x)), collapse=""), strings))), system.time(replicate(n.repetitions, gsubfn('^a*', ~ gsub('.', '-', x), strings)))) print(results)
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.