Both the initial DOCTYPE declaration and the following comment need to be
terminated.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
<!-- saved from url=(0023)http://www.iras.gov.sg/ --
should be
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0023)http://www.iras.gov.sg/ -->
If any browsers are managing to read the page it's because they probably
re-parse the thing when they discover no text on the first pass. Ask
whoever wrote the page to fix it. I actually see several '>'s missing; any
one of which could case the same problem.
--
Mac :})
** I may forward private database questions to the DBI mail lists. **
----- Original Message -----
From: "Tan Joo Geok" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, March 28, 2000 1:59 AM
Subject: Cannot parse this web page!
I am using the libwww distribution with HTML-Parser-3.07 to measure the
total time it takes to fetch a URL including all the objects contained in
it. However, there is a Web page(see attached) from which the parser
consistently cannot extract the image objects. I hope somebody would be
able to tell me what is wrong and how to get around the problem.