Re: [R] regexpr
using lapply is so great. That help me a lot. thanks. Stephen Tucker wrote: I think you are looking for paste(). And you can replace your for loop with lapply(), which will apply regexpr to every element of 'mylist' (as the first argument, which is 'pattern'). 'text' can be a vector also: mylist - c(MN,NY,FL) lapply(paste(mylist,$,sep=),regexpr,text=Those from MN:) --- runner [EMAIL PROTECTED] wrote: Hi, I 'd like to match each member of a list to a target string, e.g. -- mylist=c(MN,NY,FL) g=regexpr(mylist[1], Those from MN:) if (g0) { On list } -- My question is: How to add an end-of-string symbol '$' to the to-match string? so that 'M' won't match. Of course, MN$ will work, but i want to use it in a loop; mylist[i] is what i need. I tried mylist[1]$, but didn't work. So why it doesn't extrapolate? How to do it? Thanks a lot! -- View this message in context: http://www.nabble.com/regexpr-tf4000743.html#a11363041 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Bored stiff? Loosen up... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/regexpr-tf4000743.html#a11412603 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regexpr
Hi, I 'd like to match each member of a list to a target string, e.g. -- mylist=c(MN,NY,FL) g=regexpr(mylist[1], Those from MN:) if (g0) { On list } -- My question is: How to add an end-of-string symbol '$' to the to-match string? so that 'M' won't match. Of course, MN$ will work, but i want to use it in a loop; mylist[i] is what i need. I tried mylist[1]$, but didn't work. So why it doesn't extrapolate? How to do it? Thanks a lot! -- View this message in context: http://www.nabble.com/regexpr-tf4000743.html#a11363041 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr
mylist=c(MN,NY,FL) g=regexpr(paste(mylist[1], $, sep=), Those from MN:) if (g0) { On list } or in a loop for (i in mylist){ if (regexpr(paste(mylist[i], $, sep=)) 0){ .code for those from } } On 6/29/07, runner [EMAIL PROTECTED] wrote: Hi, I 'd like to match each member of a list to a target string, e.g. -- mylist=c(MN,NY,FL) g=regexpr(mylist[1], Those from MN:) if (g0) { On list } -- My question is: How to add an end-of-string symbol '$' to the to-match string? so that 'M' won't match. Of course, MN$ will work, but i want to use it in a loop; mylist[i] is what i need. I tried mylist[1]$, but didn't work. So why it doesn't extrapolate? How to do it? Thanks a lot! -- View this message in context: http://www.nabble.com/regexpr-tf4000743.html#a11363041 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr
I think you are looking for paste(). And you can replace your for loop with lapply(), which will apply regexpr to every element of 'mylist' (as the first argument, which is 'pattern'). 'text' can be a vector also: mylist - c(MN,NY,FL) lapply(paste(mylist,$,sep=),regexpr,text=Those from MN:) --- runner [EMAIL PROTECTED] wrote: Hi, I 'd like to match each member of a list to a target string, e.g. -- mylist=c(MN,NY,FL) g=regexpr(mylist[1], Those from MN:) if (g0) { On list } -- My question is: How to add an end-of-string symbol '$' to the to-match string? so that 'M' won't match. Of course, MN$ will work, but i want to use it in a loop; mylist[i] is what i need. I tried mylist[1]$, but didn't work. So why it doesn't extrapolate? How to do it? Thanks a lot! -- View this message in context: http://www.nabble.com/regexpr-tf4000743.html#a11363041 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Bored stiff? Loosen up... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regexpr and parsing question
The main problem I am trying to solve it this: I am importing a tab delimited file whose first line contains only one column, which is a descriptor of the form col_1 col_2 col_3, i.e. the colnames are not tab delineated but are separated by whitespace. I would like to parse this first line and make such that it becomes the colnames of the rest of the file, which I am reading into R using read.delim(). The file is so huge that I must do this in R. My first question is this: What is the best way to accomplish what I want to do? My other questions revolve around some failed attempts on my part to solve the problem on my own using regular expressions. I thought that perhaps I could change the first line to c(col_1, col_2, col_3) using gsub. I was having trouble figuring out how R uses the backslash character because I know that sometimes the backslash one would use in Perl needs to be a double backslash in R. Here is a sample of what I tried and what I got: a-col_1 col_2 col_3 gsub(\\s, , a) [1] col_1 col_2 col_3 gsub(\\s, \\s , a) [1] col_1scol_2scol_3 As you can see, it looks like R is taking a regular expression for pattern, but not taking it for replacement. Why is this? Assuming that I did want to solve my original problem with gsub and then turn the string into an R object, how would I get gsub to return c(col_1, col_2, col_3) using my original string? Finally, is there a way to declare a string as a regular expression so that R sees it the same way other languages, such as Perl do, i.e. make the backslash be interpreted the same way? For someone who is just learning regular expressions as I am, it is very frustrating to read about them in references and then have to translate what I've learned into R syntax. I was thinking that instead of enclosing the string in , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we use I() in formulae. These are a bunch of questions, but obviously I have a lot to learn! Thanks, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr and parsing question
Both spaces and tabs are whitespace so this should be good enough (unless you can have empty fields): read.table(myfile.dat, header = TRUE) See the sep= argument in ?read.table . Although I don't think you really need this, here are some regular expressions for processing a header into the form you asked for. The first line places quotes around the names, the second one inserts commas and the last one adds c( and ). s - gsub('(\\S+)', '\\1', 'col1 col2 col3') s - gsub((\\S+) , \\1, , s) sub((.*), c(\\1), s) On 1/30/07, Kimpel, Mark William [EMAIL PROTECTED] wrote: The main problem I am trying to solve it this: I am importing a tab delimited file whose first line contains only one column, which is a descriptor of the form col_1 col_2 col_3, i.e. the colnames are not tab delineated but are separated by whitespace. I would like to parse this first line and make such that it becomes the colnames of the rest of the file, which I am reading into R using read.delim(). The file is so huge that I must do this in R. My first question is this: What is the best way to accomplish what I want to do? My other questions revolve around some failed attempts on my part to solve the problem on my own using regular expressions. I thought that perhaps I could change the first line to c(col_1, col_2, col_3) using gsub. I was having trouble figuring out how R uses the backslash character because I know that sometimes the backslash one would use in Perl needs to be a double backslash in R. Here is a sample of what I tried and what I got: a-col_1 col_2 col_3 gsub(\\s, , a) [1] col_1 col_2 col_3 gsub(\\s, \\s , a) [1] col_1scol_2scol_3 As you can see, it looks like R is taking a regular expression for pattern, but not taking it for replacement. Why is this? Assuming that I did want to solve my original problem with gsub and then turn the string into an R object, how would I get gsub to return c(col_1, col_2, col_3) using my original string? Finally, is there a way to declare a string as a regular expression so that R sees it the same way other languages, such as Perl do, i.e. make the backslash be interpreted the same way? For someone who is just learning regular expressions as I am, it is very frustrating to read about them in references and then have to translate what I've learned into R syntax. I was thinking that instead of enclosing the string in , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we use I() in formulae. These are a bunch of questions, but obviously I have a lot to learn! Thanks, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr and parsing question
On Tue, 2007-01-30 at 17:23 -0500, Kimpel, Mark William wrote: The main problem I am trying to solve it this: I am importing a tab delimited file whose first line contains only one column, which is a descriptor of the form col_1 col_2 col_3, i.e. the colnames are not tab delineated but are separated by whitespace. I would like to parse this first line and make such that it becomes the colnames of the rest of the file, which I am reading into R using read.delim(). The file is so huge that I must do this in R. My first question is this: What is the best way to accomplish what I want to do? Mark, The first thing that comes to mind is a two pass approach on the file: First pass: (using example file with your first line) # Get the first line into a vector to set the colnames for the DF # during the second pass ColNames - unlist(read.table(test.txt, nrow = 1, as.is = TRUE)) str(ColNames) Named chr [1:3] col_1 col_2 col_3 - attr(*, names)= chr [1:3] V1 V2 V3 Second pass: # Now read the rest of the file, skipping the first line DF - read.delim(test.txt, skip = 1, col.names = ColNames) I believe that should get you the full data set and set the colnames based upon the first line. This should pretty much obviate the need for everything below here. My other questions revolve around some failed attempts on my part to solve the problem on my own using regular expressions. I thought that perhaps I could change the first line to c(col_1, col_2, col_3) using gsub. I was having trouble figuring out how R uses the backslash character because I know that sometimes the backslash one would use in Perl needs to be a double backslash in R. You would not want to change the first line as you have it above, as it would not be parsed properly using read.table() family functions. Here is a sample of what I tried and what I got: a-col_1 col_2 col_3 gsub(\\s, , a) [1] col_1 col_2 col_3 gsub(\\s, \\s , a) [1] col_1scol_2scol_3 As you can see, it looks like R is taking a regular expression for pattern, but not taking it for replacement. Why is this? There are various settings for how regex are interpreted by/within R. See ?grep and note the various arguments to the functions there and how they impact R's behavior here. Also, note that there is a difference (to further complicate your life...) between the characters that R displays by default using print() and how they are displayed using cat(). See below. a [1] col_1 col_2 col_3 gsub( , , , a) [1] col_1, col_2, col_3 or to get you to your vector statement above: Note the result here: paste(c(\, gsub( , \, \ , a), \), sep = ) [1] c(\col_1\, \col_2\, \col_3\) Now see how it displays when the escaped double quote chars are interpreted properly using cat(): cat(paste(c(\, gsub( , \, \ , a), \), sep = ), \n) c(col_1, col_2, col_3) Assuming that I did want to solve my original problem with gsub and then turn the string into an R object, how would I get gsub to return c(col_1, col_2, col_3) using my original string? Again, note the two pass solution above. It's easier, unless you would want to consider using awk/sed from a CLI, which I generally avoid at all costs... Finally, is there a way to declare a string as a regular expression so that R sees it the same way other languages, such as Perl do, i.e. make the backslash be interpreted the same way? For someone who is just learning regular expressions as I am, it is very frustrating to read about them in references and then have to translate what I've learned into R syntax. I was thinking that instead of enclosing the string in , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we use I() in formulae. Part of the challenge is noting the different behaviors of regex within R and how that behavior is affected by the aforementioned arguments. Also, noting how the output is displayed within R relative to the interpretation of escaped characters as is seen above. These are a bunch of questions, but obviously I have a lot to learn! Thanks, Mark HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr and parsing question
And here is an alternative to the regular expressions (although again I don't think you really need any of this): capture.output(dput(strsplit(col1 col2 col3, )[[1]])) [1] c(\col1\, \col2\, \col3\) On 1/30/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Both spaces and tabs are whitespace so this should be good enough (unless you can have empty fields): read.table(myfile.dat, header = TRUE) See the sep= argument in ?read.table . Although I don't think you really need this, here are some regular expressions for processing a header into the form you asked for. The first line places quotes around the names, the second one inserts commas and the last one adds c( and ). s - gsub('(\\S+)', '\\1', 'col1 col2 col3') s - gsub((\\S+) , \\1, , s) sub((.*), c(\\1), s) On 1/30/07, Kimpel, Mark William [EMAIL PROTECTED] wrote: The main problem I am trying to solve it this: I am importing a tab delimited file whose first line contains only one column, which is a descriptor of the form col_1 col_2 col_3, i.e. the colnames are not tab delineated but are separated by whitespace. I would like to parse this first line and make such that it becomes the colnames of the rest of the file, which I am reading into R using read.delim(). The file is so huge that I must do this in R. My first question is this: What is the best way to accomplish what I want to do? My other questions revolve around some failed attempts on my part to solve the problem on my own using regular expressions. I thought that perhaps I could change the first line to c(col_1, col_2, col_3) using gsub. I was having trouble figuring out how R uses the backslash character because I know that sometimes the backslash one would use in Perl needs to be a double backslash in R. Here is a sample of what I tried and what I got: a-col_1 col_2 col_3 gsub(\\s, , a) [1] col_1 col_2 col_3 gsub(\\s, \\s , a) [1] col_1scol_2scol_3 As you can see, it looks like R is taking a regular expression for pattern, but not taking it for replacement. Why is this? Assuming that I did want to solve my original problem with gsub and then turn the string into an R object, how would I get gsub to return c(col_1, col_2, col_3) using my original string? Finally, is there a way to declare a string as a regular expression so that R sees it the same way other languages, such as Perl do, i.e. make the backslash be interpreted the same way? For someone who is just learning regular expressions as I am, it is very frustrating to read about them in references and then have to translate what I've learned into R syntax. I was thinking that instead of enclosing the string in , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we use I() in formulae. These are a bunch of questions, but obviously I have a lot to learn! Thanks, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr and portability issue
On Tue, 2 Aug 2005, Marco Blanchette wrote: I am still forging my first arms with R and I am fighting with regexpr() as well as portability between unix and windoz. I need to extract barcodes from filenames (which are located between a double and single underscore) as well as the directory where the filename is residing. Here is the solution I came to: aFileName - /Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt t - regexpr(__\\d*_,aFileName, perl=T) t.dir - regexpr(^.*/, aFileName, perl=T) base.name - substr(aFileName, t+2, t-2 + attr(t,match.length)) base.dir - substr(aFileName, t.dir, attr(t.dir,match.length)) My questions are: 1) Is there a more elegant way to deal with regular expressions (read here: more easier, more like perl style). Yes, use sub and backreferences. An example from the R sources doing something similar: wfile - sub(/chm/([^/]*)$, , file) thispkg - sub(.*/([^/]*)/chm/([^/]*)$, \\1, file) However, R does have functions basename() and dirname() to do this! 2) I have a portability problem when I extract the base.dir Windoz is using '\' instead of '/' to separate directories. That is misinformation: Windows (sic) accepts either / or \ (see the rw-FAQ and the R FAQ). Use chartr(\\, /, path) to map \ to /. The `portability problem' appears to be of your own making -- take heart that R itself manages to manipulate filepaths portably. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] regexpr and portability issue
Dear all-- I am still forging my first arms with R and I am fighting with regexpr() as well as portability between unix and windoz. I need to extract barcodes from filenames (which are located between a double and single underscore) as well as the directory where the filename is residing. Here is the solution I came to: aFileName - /Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt t - regexpr(__\\d*_,aFileName, perl=T) t.dir - regexpr(^.*/, aFileName, perl=T) base.name - substr(aFileName, t+2, t-2 + attr(t,match.length)) base.dir - substr(aFileName, t.dir, attr(t.dir,match.length)) My questions are: 1) Is there a more elegant way to deal with regular expressions (read here: more easier, more like perl style). 2) I have a portability problem when I extract the base.dir Windoz is using '\' instead of '/' to separate directories. Any suggestions/comments Many Tx Marco Blanchette, Ph.D. [EMAIL PROTECTED] Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] regexpr and portability issue
Try this. The regular expression says to match - anything - followed by a double underscore - followed by one or more digits - followed by an underscore - followed by anything. The digits have been parenthesized so that they can be referred to in the backreference \\1.Also use the R function dirname rather than regular expressions. base.name - sub(.*__([[:digit:]]+)_.*, \\1, aFileName, ext = TRUE) base.dir - dirname(aFileName) On 8/3/05, Marco Blanchette [EMAIL PROTECTED] wrote: Dear all-- I am still forging my first arms with R and I am fighting with regexpr() as well as portability between unix and windoz. I need to extract barcodes from filenames (which are located between a double and single underscore) as well as the directory where the filename is residing. Here is the solution I came to: aFileName - /Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt t - regexpr(__\\d*_,aFileName, perl=T) t.dir - regexpr(^.*/, aFileName, perl=T) base.name - substr(aFileName, t+2, t-2 + attr(t,match.length)) base.dir - substr(aFileName, t.dir, attr(t.dir,match.length)) My questions are: 1) Is there a more elegant way to deal with regular expressions (read here: more easier, more like perl style). 2) I have a portability problem when I extract the base.dir Windoz is using '\' instead of '/' to separate directories. Any suggestions/comments Many Tx Marco Blanchette, Ph.D. [EMAIL PROTECTED] Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regexpr with .
Trevor, The . is a regex meta-character that matches any character. In order to look specifically for a ., the you must escape it with a \, and that \ must also be escaped, thus, regexpr(\\., Female.Alabama) [1] 7 attr(,match.length) [1] 1 HTH steve Thompson, Trevor wrote: I'm trying to use the regexpr function to locate the decimal in a character string. Regardless of the position of the decimal, the function returns 1. For example, regexpr(., Female.Alabama) [1] 1 attr(,match.length) [1] 1 In trying to figure out what was going on here, I tried the below command: gsub(., ,, Female.Alabama) [1] ,, It looks like R is treating every character in the string as if it were decimal. I didn't see anything in the help file about . being some kind of special character. Any idea why R is treating a decimal this way in these functions? Any suggestions how to get around this? Thanks for any suggestions. -Trevor [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Regexpr with .
I'm trying to use the regexpr function to locate the decimal in a character string. Regardless of the position of the decimal, the function returns 1. You need to escape it. gsub(\\.,,,Female.Alabama) [1] Female,Alabama __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Regexpr with .
Thompson, Trevor wrote: I'm trying to use the regexpr function to locate the decimal in a character string. Regardless of the position of the decimal, the function returns 1. For example, regexpr(., Female.Alabama) You probably want backslashes to indicate that . should not be treated as a metacharacter; it should be taken literally. regexpr(\\., Female.Alabama) [1] 7 attr(,match.length) [1] 1 hope this helps, Chuck Cleland __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Regexpr with .
I'm trying to use the regexpr function to locate the decimal in a character string. Regardless of the position of the decimal, the function returns 1. For example, regexpr(., Female.Alabama) [1] 1 attr(,match.length) [1] 1 In trying to figure out what was going on here, I tried the below command: gsub(., ,, Female.Alabama) [1] ,, It looks like R is treating every character in the string as if it were decimal. I didn't see anything in the help file about . being some kind of special character. Any idea why R is treating a decimal this way in these functions? Any suggestions how to get around this? Thanks for any suggestions. -Trevor [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Regexpr with .
On 13-Aug-03 Barry Rowlingson wrote: Thompson, Trevor wrote: I didn't see anything in the help file about . being some kind of special character. Any idea why R is treating a decimal this way in these functions? Any suggestions how to get around this? '.' is the regexpr character for matching any single character! regexpr(a.e, Female.Alabama) [1] 4 To actually search for a dot, you need to 'escape' it with a backslash, but of course the backslash needs escaping itself, with another backslash. Luckily that backslash doesn't need escaping, otherwise we would quickly run out of patience. regexpr(\\., Female.Alabama) [1] 7 It's also worth remembering the use of [], normally used to enclose a disjunctive list of characters to match (e.g. [Aa] matches either A or a) or a range (e.g. [0-9] matches any digit). Any metacharacter occurring within will be interpreted literally with exceptions \ and (for obvious reasons) ] which must be escaped (in which case the use of [] is redundant); -- however, [ works! regexpr(a.e, Female.Alabama) [1] 4 attr(,match.length) [1] 3 regexpr([.], Female.Alabama) [1] 7 attr(,match.length) [1] 1 regexpr([[], Female[Alabama) [1] 7 attr(,match.length) [1] 1 regexpr([\\], Female\\Alabama) [1] 7 attr(,match.length) [1] 1 regexpr([\]], Female]Alabama) [1] 7 attr(,match.length) [1] 1 Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 13-Aug-03 Time: 22:14:06 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Regexpr with .
Try regexpr(\\., Female.Alabama) -Original Message- From: Thompson, Trevor [mailto:[EMAIL PROTECTED] Sent: 13 August 2003 15:47 To: [EMAIL PROTECTED] Subject: [R] Regexpr with . I'm trying to use the regexpr function to locate the decimal in a character string. Regardless of the position of the decimal, the function returns 1. For example, regexpr(., Female.Alabama) [1] 1 attr(,match.length) [1] 1 In trying to figure out what was going on here, I tried the below command: gsub(., ,, Female.Alabama) [1] ,, It looks like R is treating every character in the string as if it were decimal. I didn't see anything in the help file about . being some kind of special character. Any idea why R is treating a decimal this way in these functions? Any suggestions how to get around this? Thanks for any suggestions. -Trevor [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Regexpr with .
Thompson, Trevor wrote: It looks like R is treating every character in the string as if it were decimal. I didn't see anything in the help file about . being some kind of special character. Any idea why R is treating a decimal this way in these functions? Any suggestions how to get around this? '.' is the regexpr character for matching any single character! regexpr(a.e, Female.Alabama) [1] 4 To actually search for a dot, you need to 'escape' it with a backslash, but of course the backslash needs escaping itself, with another backslash. Luckily that backslash doesn't need escaping, otherwise we would quickly run out of patience. regexpr(\\., Female.Alabama) [1] 7 Baz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Regexpr with .
Try regexpr(\\., Female.Alabama) and gsub(\\., ,, Female.Alabama) X-Sybari-Trust: 9293cd92 d90ef28b 235e1558 093d From: Thompson, Trevor [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Wed, 13 Aug 2003 10:46:45 -0400 MIME-Version: 1.0 X-Virus-Scanned: by amavisd-milter (http://amavis.org/) X-Virus-Scanned: by amavisd-milter (http://amavis.org/) X-Spam-Status: No, hits=0.6 required=5.0 tests=HTML_30_40 version=2.54 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.54 (1.174.2.17-2003-05-11-exp) Content-Disposition: inline Content-Transfer-Encoding: 7bit Subject: [R] Regexpr with . X-BeenThere: [EMAIL PROTECTED] X-Mailman-Version: 2.1.2 List-Id: Main R Mailing List: Primary help r-help.stat.math.ethz.ch List-Help: mailto:[EMAIL PROTECTED] List-Post: mailto:[EMAIL PROTECTED] List-Subscribe: https://www.stat.math.ethz.ch/mailman/listinfo/r-help, mailto:[EMAIL PROTECTED] List-Archive: https://www.stat.math.ethz.ch/pipermail/r-help List-Unsubscribe: https://www.stat.math.ethz.ch/mailman/listinfo/r-help, mailto:[EMAIL PROTECTED] I'm trying to use the regexpr function to locate the decimal in a character string. Regardless of the position of the decimal, the function returns 1. For example, regexpr(., Female.Alabama) [1] 1 attr(,match.length) [1] 1 In trying to figure out what was going on here, I tried the below command: gsub(., ,, Female.Alabama) [1] ,, It looks like R is treating every character in the string as if it were decimal. I didn't see anything in the help file about . being some kind of special character. Any idea why R is treating a decimal this way in these functions? Any suggestions how to get around this? Thanks for any suggestions. -Trevor [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Jianhua Zhang Department of Biostatistics Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Regexpr capturing in R?
msg.pgp Description: PGP message