It seems that the string "<>" at the end of some html document
is interpreted as "comment", and not as text.

Is this a bug, or this this some obscure syntax in HTML that I do
not know about.

(In my program I frequently need to parse some short string
as html.  Like "Press the buttons <>".  Bingo.)

Here is a small program that demonstrates the behaviour.
Even adding a space or newline at the end of the string makes
the resulting token "T" again.


$ perl -MHTML::TokeParser -le '
    $s = "<>xx<>";
    $p = HTML::TokeParser->new(\$s);
    print join "\t", @$t    while ($t=$p->get_token()),
    '
T       <>xx
C       <>



--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  [EMAIL PROTECTED]
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************

Reply via email to