Re: [R] regular expression help to extract specific strings from text

hadley wickham Wed, 31 Mar 2010 07:53:12 -0700

On Wed, Mar 31, 2010 at 8:20 AM, Tony B <tony.bre...@googlemail.com> wrote:
> Dear all,
>
> Lets say I have the following:
>
>> x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are 
>> you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at 
>> statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: 
>> @Eve I'm sure they'll sort it out :)", "blahblah")
>> x
> [1] "Eve: Going to try something new
> today..."
> [2] "Adam: Hey @Eve, how are you finding R?
> #rstats"
> [3] "Eve: @Adam, It's awesome, so much better at statistics that
> \n#Excel ever was! @Cain & @Able disagree though :("
> [4] "Adam: @Eve I'm sure they'll sort it
> out :)"
> [5] "blahblah"
>
> I would like to come up with a data frame which looks like this
> (pulling out the usernames and #tags):
>
>> data.frame(Msg = x, Source = c("Eve", "Adam", "Eve", "Adam", NA), Mentions = 
>> c(NA, "Eve", "Adam, Cain, Able", "Eve", NA), HashTags = c(NA, "rstats", 
>> "Excel", NA, NA))


You can do this pretty easily with the stringr package:

library(stringr)
str_extract_all(x, "@[a-zA-z]+")
sapply(str_extract_all(x, "@[a-zA-z]+"), str_c, collapse = ", ")

Hadley



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help to extract specific strings from text

Reply via email to