Dear list, I have the following text to parse (originating from readLines as some lines have unequal size),
st = c("START text1 1 text2 2.3", "whatever intermediate text", "START text1 23.4 text2 3.1415") from which I'd like to extract the lines starting with "START", and group the subsequent fields in a data.frame in this format: text1 text2 1 2.3 23.4 3.1415 All the lines containing "START" have the same number of fields, but this number may vary from file to file. I have managed to get this minimal example work, but I am at a loss as for handling an arbitrary number of couples (text value), library(gsubfn) ( parsed = strapply(st, "^START +([[:alnum:]]+) +([0-9.]+) +([[:alnum:]]+) +([0-9.]+)",c, simplify=rbind,combine=c) ) d = data.frame(parsed[ ,c(2,4)]) names(d) <- apply(parsed[ ,c(1,3)], 2, unique) d ## this one has more fields: how do I generalize the regular expression? st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate text", "START text1 23.4 text2 3.1415 text3 6") Best regards, Baptiste ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.