[EMAIL PROTECTED] wrote: > > http://moneycentral.msn.com/companyreport?Symbol=BBBY > > I can't validate it and my standard Python XML parsing tools don't work on it. > > If this was just some teenager's web site I'd move on. Is there any hope > avoiding regular expression hacks to extract the data from this page?
lynx -dump http://moneycentral.msn.com/companyreport?Symbol=BBBY worked for me. I understand that lynx is not an XML tool, but I would say that this is not an XML document. Nor would file(1): % file companyreport\?Symbol=BBBY companyreport?Symbol=BBBY: ASCII HTML document text, with very long lines, with CRLF, LF line terminators Because, you know, XML documents start with: <?xml ...> -john -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
