On Fri, Jul 26, 2002 at 02:02:54PM -0400, Jeff 'japhy' Pinyan wrote:
> 
> It's best to come up with a hash of strings and replacements:
> 
>   my %rep = qw(
>     ldblquote rt_quote
>     rdblquote lt_quote
>     emdash    em_dash
>     rquote    r_quote
>     tab               tab
>     lquote    l_quote
>   );
> 
> Then create a regex:
> 
>   my $rx = join "|", map quotemeta, keys %rep;
> 
> Then use it in a larger regex:
> 
>   $source =~ s[\\($rx) ][<$rep{$1}/>]g;
> 
> Ta da!  ONLY one pass through the string. 

This looks really nice! I'll have to test it with a timer. I'd imgaine
it would be much faster because you only make one pass through. On
the other hand, doesn't perl have to recompile the $rx each time because
it is a variable? After all, $rx might have changed--though in my case,
it definitely wouldn't have.

> You'll need to beef up the hash
> and the regex as needed, if not everything is '\\IN ' and not every
> replacement is '<OUT/>'.

As a matter of fact, the expressions take only two forms:

\emdash Regular text
\'9oeRegular text

Some of the expressions (the ones for foreign characters) don't have a
space after the control word. So I think:


 $source =~ s[\\($rx)(?:\s)*][<$rep{$1}/>]g;

 Should work?

On another note, my script is 1100 lines long, and seems to work.
It seems like there is a need for converting RTF to XML, since the perl
convertors availble only convert to HTML. 

I would like to release the script at some point, but when I get tips
off this site, I realize how much better an experienced perl programmer
could do things. It would be much more effective to work on this as part
of a team, but I've never done something like this before. I guess I'll
post feelers on other mailing lists.

(This really should be another thread!)

Thanks!

Paul

-- 

************************
*Paul Tremblay         *
*[EMAIL PROTECTED]*
************************

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to