Wacek Kusnierczyk wrote:
> Gabor Grothendieck wrote:
>   
>> On Sat, Jan 31, 2009 at 4:46 PM, Wacek Kusnierczyk
>> <waclaw.marcin.kusnierc...@idi.ntnu.no> wrote:
>>   
>>     
>>>
>>> to extend the context, if you were to solve the problem in perl, the
>>> regex below would work in perl 5.10, but not in earlier versions of
>>> perl;  another approach is to replace the unwanted leading characters
>>> with equally many replacement characters at once.
>>>
>>> $string = 'aabaab';
>>>
>>> # perl 5.10
>>> $string =~ s/a|(*COMMIT)(*FAIL)/c/g
>>> # $string is 'ccbaab'
>>>
>>> # any recent perl
>>> $string =~ s/^a*/'c' x length $&/e;
>>> # $string is 'ccbaab'
>>>
>>> i don't know how (if) the latter could be done in r.
>>>     
>>>       
>> This seems quite analogous:
>>
>> library(gsubfn)
>> s <- "aabaab"
>> gsubfn("^a*", ~ paste(rep("c", nchar(x)), collapse = ""), s)[[1]]
>>   
>>     
>
> indeed, as does the following variant:
>
> gsubfn("^a*", ~ gsub(".", "c", x), s)[[1]]
>
>   

just for the record, the two gsubfn-based versions run substantially
slower than the gsub-based one;  with 1000 strings of 100 random letters
each, the difference is 2 orders of magnitude (see the attached naive
test).  i guess much of it is due to r-based implementation of gsubfn,
and when you have it in c the difference will reduce dramatically.

vQ
#!/usr/bin/r

n.strings = 1000
n.letters = 100
n.repetitions = 100

strings = replicate(n.strings, paste(sample(letters, n.letters, replace=TRUE), 
collapse=""))

library(gsubfn)
results = list(
        system.time(replicate(n.repetitions, gsub('a|(*COMMIT)(*FAIL)', '-', 
strings, perl=TRUE))),
        system.time(replicate(n.repetitions, gsubfn('^a*', ~ paste(rep('-', 
nchar(x)), collapse=""), strings))),
        system.time(replicate(n.repetitions, gsubfn('^a*', ~ gsub('.', '-', x), 
strings))))
        
print(results)
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to