Re: [R] regexp inside and outside brackets

Marc Schwartz Fri, 11 Dec 2015 06:41:19 -0800

> On Dec 11, 2015, at 7:50 AM, Adrian Dușa <dusa.adr...@unibuc.ro> wrote:
> 
> For the regexp aficionados, out there:
> 
> I need a regular expression to extract either everything within some
> brackets, or everything outside the brackets, in a string.
> 
> This would be the test string:
> "A1{0}~B0{1} CO{a2}NN{12}"
> 
> Everything outside the brackets would be:
> 
> "A1 ~B0 CO NN"
> 
> and everything inside the brackets would be:
> 
> "0 1 a2 12"
> 
> I have a working solution involving strsplit(), but I wonder if there is a
> more direct way.
> Thanks in advance for any hint,
> Adrian



x <- "A1{0}~B0{1} CO{a2}NN{12}"

The first is a bit easier:

> gsub("\\{[[:alnum:]]*\\}", " ", x)
[1] "A1 ~B0  CO NN "


The second, at least using standard functions, is a bit more cumbersome, given 
the repeated sequences:

> gsub("\\{|\\}", "", regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]])
[1] "0"  "1"  "a2" "12"

Note that a multi-element vector is returned.

In the above:

> gregexpr("\\{[[:alnum:]]+\\}", x)
[[1]]
[1]  3  9 15 21
attr(,"match.length")
[1] 3 3 4 4
attr(,"useBytes")
[1] TRUE

returns the starting positions of the matches, which are passed to regmatches() 
to get the actual values:

> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))
[[1]]
[1] "{0}"  "{1}"  "{a2}" "{12}"

The gsub() replaces the returned braces.

You could invert the result of regmatches() to get:

> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x), invert = TRUE)[[1]]
[1] "A1"  "~B0" " CO" "NN"  ""   


Of course, this presumes non-nesting of braces, etc.

Regards,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regexp inside and outside brackets

Reply via email to