hi guys -- In a message dated 5/15/2009 8:55:30 PM Eastern Standard Time, williamawalt...@aol.com writes:
> In a message dated 5/15/2009 6:20:40 PM Eastern Standard Time, ari.constan...@gmail.com writes: > > > On Fri, May 15, 2009 at 11:18 PM, Barry Brevik <bbre...@stellarmicro.com > wrote: > > > > > I am running Active Perl 5.8.8. > > > ... > > > Difficulty: the fields contain hundreds of words both preceding and > > > following the "bad" words, so I have to be able to pick out the > > > lower-case words that contain one embedded upper-case character. > > > ... > > > Barry Brevik > > > > Hi Barry, > > > > Maybe something like this would help: > > > > $ cat test.txt > > madeStyle > > facilitatedOne > > Anti-magneticQuality > > > > $ cat test.txt |perl -pe 's/(\w+)([A-Z])/\1\. \2/g' > > made. Style > > facilitated. One > > Anti-magnetic. Quality > > > > Regards, Ari Constancio > > ... > > a better approach might be something like: > > >cat test.txt | perl -wMstrict -pe > "s{ ([[:lower:]]) ([[:upper:]] [[:lower:]]) }{$1. $2}xmsg" > made. Style > facilitated. One > Anti-magnetic. Quality > 123FOO > > hth -- bill walters well, english is a complicated thing, as, i guess, are all natural languages. it occurred to me that the solution i suggested, that a new sentence begins with a uc letter and at least one lc letter (which was how i interpreted the original 'lower-case words that contain one embedded upper-case character' spec), fails for a very common word. the approach below makes separate regex definitions for end-of-sentence and beginning-of-sentence patterns; these are more easily adapted as requirements mature. of course, the new approach fails for BiCapitalized words. sigh. using separate regex definitions might come into play here: one might, for instance, define a list of bi-capitalized words that would be used with a look-around to avoid improper substitutions. (i cannot think of a case in which a proper sentence ends with anything other than an lc letter before the period. if there is such, the separate regex approach could, i think, be easily adapted to handle it.) >cat test.txt madeStyle facilitatedOne Anti-magneticQuality 123FOO the endA new PowerPoint >cat test.txt | perl -wMstrict -pe "INIT { my $sen_end = qr{ [[:lower:]] }xms; my $new_sen = qr{ [[:upper:]] }xms; sub S { s{ ($sen_end) ($new_sen) }{$1. $2}xmsg } } S; " made. Style facilitated. One Anti-magnetic. Quality 123FOO the end. A new Power. Point again, hth -- bill walters <BR><BR>**************<BR>Recession-proof vacation ideas. Find free things to do in the U.S. (http://travel.aol.com/travel-ideas/domestic/national-tourism-week?ncid=emlcntustrav00000002)</HTML>
_______________________________________________ ActivePerl mailing list ActivePerl@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs