On Mon, Jul 15, 2002 at 10:26:25AM +0200, Janek Schleicher wrote:

> 
> To increase speed, we can make also a lookahead statement:
> 
> my @tokens = split / ( \\ (?=\S)            # there's never a whitespace
>                           (?: [^\s{}]+  |
>                               [^\s\\}]+ |
>                               [\\}]       )
>                        | }
>                      )/x
>              => $line;
> 

Can you explain the lookahead statement to me, or better yet,
point me to some good documentation? It is not explained in *Perl
Cookbook.*

As John pointed out, your solution leaves off the leading open
bracket. However, it is about twice as quick as the previous
solution. You solution tokenizes this line:

{\i italics}

This way:

'{'
'\i'
' italics'

Actually, I can deal with this. I would just have to set a flag
in my program. If the preceeding token was '{' then the next
token is part of an opening group.

However, your solution brings up another problem. It does not
speparate true brackets (which convey formatting information)
from escaped brackets (those in the text). Likewise, no
distinction is made between true back slashes and escaped back
slashes.

Here is a very typical line, and the tokesn it should be split
into.

\pard\plain \{All of this text {\i italicized words} is \{\}
\}\{between brackets\} \\escaped_back_slash\par

'\pard'
''
'\plain'
' '
'\{'
'All of this text '
'{\i'
' italicized words'
'}'
' is '
'\{'
''
'\}'
' '
'\}'
''
'\{between'
' brackets'
'\}'
''
'\\'
'\\escaped_back_slash'
'\par'
' 

Thanks

Paul


-- 

************************
*Paul Tremblay         *
*[EMAIL PROTECTED]*
************************

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to