Good point, John. Illustrates the danger of assuming there are no "perverse 
cases".

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062


From: John McKown 
<john.archie.mck...@gmail.com<mailto:john.archie.mck...@gmail.com>>
Date: Wednesday, January 14, 2015 at 8:47 AM
To: dh m <macque...@llnl.gov<mailto:macque...@llnl.gov>>
Cc: Mark Leeds <marklee...@gmail.com<mailto:marklee...@gmail.com>>, 
"r-help-stat.math.ethz.ch" 
<r-h...@stat.math.ethz.ch<mailto:r-h...@stat.math.ethz.ch>>
Subject: Re: [R] regular expression question

On Wed, Jan 14, 2015 at 10:03 AM, MacQueen, Don 
<macque...@llnl.gov<mailto:macque...@llnl.gov>> wrote:
I know you already have a couple of solutions, but I would like to mention
that it can be done in two steps with very simple regular expressions. I
would have done:

s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test",
       'rhofixedtest','norhofixedtest')
res <- gsub('norhofixed$', '',s)
res <- gsub('rhofixed$', '',res)
res
[1] "lngimbint"      "lngimbnoint"    "test"
    "rhofixedtest"   "norhofixedtest"


(this is for those of us who don't understand regular expressions very
well!)

​There is one possible problem with your solution.​ Consider the string: 
arhofixednorhofixed. It ends with norhofixed and, according to the original 
specification, needs to result in arhofixed. (I will admit this is a contrived 
case which is very unlikely to occur in reality). But since you do TWO regular 
expressions, first removing the trailing norhofixed, resulting in "arhofixed" 
(the correct answer?), but then reduces that to simply "a". The other regular 
expressions correctly remove either norhofixed or rhofixed, if they are written 
_correctly_. That is, they check first for norhofixed, with an alternate of 
rhofixed, or conditionally match the no in front of the rhofixed at the very 
end of the string (my example). To be even more explicit the regexp 
"nohrofixed|rhofixed" will work properly but "rhofixed|norhofixed" will not 
because the "norhofixed" won't be looked for if the "rhofixed" matches. Yes, 
regular expressions can be complicated. Although I have a liking for them due 
to their expressiveness and power, it is like an person using raw nitroglycerin 
instead of dynamite. Dangerous.



-Don

--
Don MacQueen

Lawrence Livermore National Laboratory
--
​
While a transcendent vocabulary is laudable, one must be eternally careful so 
that the calculated objective of communication does not become ensconced in 
obscurity.  In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to