[EMAIL PROTECTED] wrote:
> 
> http://moneycentral.msn.com/companyreport?Symbol=BBBY
> 
> I can't validate it and my standard Python XML parsing tools don't work on it.
> 
> If this was just some teenager's web site I'd move on.  Is there any hope
> avoiding regular expression hacks to extract the data from this page?

lynx -dump http://moneycentral.msn.com/companyreport?Symbol=BBBY
worked for me.

I understand that lynx is not an XML tool, but I would say that this is
not an XML document. Nor would file(1):

% file companyreport\?Symbol=BBBY 
companyreport?Symbol=BBBY: ASCII HTML document text, with very long lines, with 
CRLF, LF line terminators

Because, you know, XML documents start with:

<?xml ...>

-john


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to