[R] regular expressions

baptiste auguie Mon, 26 Oct 2009 06:31:13 -0700

Dear list,

I have the following text to parse (originating from readLines as some
lines have unequal size),


st = c("START text1 1 text2 2.3", "whatever intermediate text", "START
text1 23.4 text2 3.1415")

from which I'd like to extract the lines starting with "START", and
group the subsequent fields in a data.frame in this format:

  text1  text2
     1    2.3
  23.4 3.1415


All the lines containing "START" have the same number of fields, but
this number may vary from file to file.

I have managed to get this minimal example work, but I am at a loss as
for handling an arbitrary number of couples (text value),

library(gsubfn)

( parsed =
strapply(st, "^START +([[:alnum:]]+) +([0-9.]+) +([[:alnum:]]+)
+([0-9.]+)",c, simplify=rbind,combine=c) )

d = data.frame(parsed[ ,c(2,4)])
names(d) <- apply(parsed[ ,c(1,3)], 2, unique)
d

## this one has more fields: how do I generalize the regular expression?
st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate
text", "START text1 23.4 text2 3.1415 text3 6")

Best regards,


Baptiste

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regular expressions

Reply via email to