> On Dec 11, 2015, at 7:50 AM, Adrian Dușa <dusa.adr...@unibuc.ro> wrote: > > For the regexp aficionados, out there: > > I need a regular expression to extract either everything within some > brackets, or everything outside the brackets, in a string. > > This would be the test string: > "A1{0}~B0{1} CO{a2}NN{12}" > > Everything outside the brackets would be: > > "A1 ~B0 CO NN" > > and everything inside the brackets would be: > > "0 1 a2 12" > > I have a working solution involving strsplit(), but I wonder if there is a > more direct way. > Thanks in advance for any hint, > Adrian
x <- "A1{0}~B0{1} CO{a2}NN{12}" The first is a bit easier: > gsub("\\{[[:alnum:]]*\\}", " ", x) [1] "A1 ~B0 CO NN " The second, at least using standard functions, is a bit more cumbersome, given the repeated sequences: > gsub("\\{|\\}", "", regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]]) [1] "0" "1" "a2" "12" Note that a multi-element vector is returned. In the above: > gregexpr("\\{[[:alnum:]]+\\}", x) [[1]] [1] 3 9 15 21 attr(,"match.length") [1] 3 3 4 4 attr(,"useBytes") [1] TRUE returns the starting positions of the matches, which are passed to regmatches() to get the actual values: > regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x)) [[1]] [1] "{0}" "{1}" "{a2}" "{12}" The gsub() replaces the returned braces. You could invert the result of regmatches() to get: > regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x), invert = TRUE)[[1]] [1] "A1" "~B0" " CO" "NN" "" Of course, this presumes non-nesting of braces, etc. Regards, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.