On Sun, Nov 14, 2010 at 11:42 AM, Zachary Brooks <zbro...@email.arizona.edu>wrote:
> Hello again, > > Yesterday I had a question on pattern matching. A couple of people > responded > with very useful information. After some finagling, I got my rudimentary > code to work. I'm a PhD student studying computational linguistics without > any formal programming training. While there are various modules that can > be > applied to my questions, our professor wants us to manually code things so > we understand the wider problems of computational linguistics. With that, > here is what I'm trying to do. > > In a given file, I believe it was XML originally, insert <s> at the > beginning of every sentence and </s> at the end of every sentence. So far, > I've got the following. The output is in *bold*. > > first, forget about about testing tons of regex in programs. if you're trying to learn, it'll make you go nuts. try something like http://regexpal.com/ or google for other 'regex tester' sites. there are also programs (ymmv). btw, i don't see your bold... > $hello = "This is some sample text."; > > $hello =~ s/^../<s>/gi; > $hello =~ s/..$/<\/s>/gi; > second, why not use a place holder like someone recommended yesterday? something like: s/^(.+)$/<s>\1<\/s>/g > > print "$hello\n"; > > *<s>is is some sample tex</s>* > * > * > > I can see why this is happening. I'm telling the program to do exactly what > it did. But what I want the output to look like is this. > > *<s> This is some sample text.</s>* > * > * > > Any comments are very appreciated. This is a very helpful crowd. > > Cheers. > > Zach > > > -- > > -------------------------------------------------------------------------------------------------- > Zachary S. Brooks > PhD Student in Second Language Acquisition and Teaching (SLAT) > The University of Arizona - http://www.coh.arizona.edu/slat/ > Graduate Associate in Teaching - Department of English > M.A. Applied Linguistics - University of Massachusetts Boston > > --------------------------------------------------------------------------------------------------- >