On Wed, Mar 31, 2010 at 8:20 AM, Tony B <tony.bre...@googlemail.com> wrote: > Dear all, > > Lets say I have the following: > >> x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are >> you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at >> statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: >> @Eve I'm sure they'll sort it out :)", "blahblah") >> x > [1] "Eve: Going to try something new > today..." > [2] "Adam: Hey @Eve, how are you finding R? > #rstats" > [3] "Eve: @Adam, It's awesome, so much better at statistics that > \n#Excel ever was! @Cain & @Able disagree though :(" > [4] "Adam: @Eve I'm sure they'll sort it > out :)" > [5] "blahblah" > > I would like to come up with a data frame which looks like this > (pulling out the usernames and #tags): > >> data.frame(Msg = x, Source = c("Eve", "Adam", "Eve", "Adam", NA), Mentions = >> c(NA, "Eve", "Adam, Cain, Able", "Eve", NA), HashTags = c(NA, "rstats", >> "Excel", NA, NA))
You can do this pretty easily with the stringr package: library(stringr) str_extract_all(x, "@[a-zA-z]+") sapply(str_extract_all(x, "@[a-zA-z]+"), str_c, collapse = ", ") Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.