Hi On Mon, Nov 16, 2009 at 12:02 PM, David Pollak < [email protected]> wrote:
> > > On Sun, Nov 15, 2009 at 11:24 PM, <[email protected]> wrote: > >> Hello, >> >> I am a newby to both scala and lift. Now that that's out of the way I'm >> wondering how to properly use PCDataXmlParser to read and parse html. > > > PXDataXmlParser requires well formed XML. It is an XML parser. > > There are plenty of Java libraries that parse HTML. Please use one of > those for parsing stuff that's not known to be well formed XML. > I would suggest you start here: http://www.hars.de/2009/01/html-as-xml-in-scala.html I implemented this last week and it works well. > > > >> I pull data from a restful service by doing the following: >> >> [code] >> import dispatch._ >> import Http._ >> >> import net.liftweb.util._; >> import scala.xml._; >> >> def upcDatabase(): Box[NodeSeq] = { >> val http = new Http >> var stream: String = ""; >> http("http://www.upcdatabase.com/item/0606949324124" >- (arg => stream = >> arg)) >> stream; >> PCDataXmlParser(stream); >> } >> >> val feedXML: Box[NodeSeq] = upcDatabase; >> [/code] >> >> when doing this I get the following exception: >> >> [exception] >> INF: [console logger] dispatch: GET >> http://www.upcdatabase.com/item/0606949324124 >> log4j:WARN No appenders could be found for logger >> (org.apache.http.impl.conn.SingleClientConnManager). >> log4j:WARN Please initialize the log4j system properly. >> :96:5: '<' not allowed in attrib value </a> ^ >> :97:1: '<' not allowed in attrib value</p>^ >> :98:1: '<' not allowed in attrib value</td>^ >> :99:1: '<' not allowed in attrib value<td valign="top" width="70%">^ >> :99:27: whitespace expected<td valign="top" width="70%"> ^ >> :99:27: '>' expected instead of '%'<td valign="top" width="70%"> ^ >> Exception in thread "main" java.lang.ExceptionInInitializerError >> at >> ca.ctrlspace.loveItHateItWeb.xml.UpcDatabaseFeed.main(UpcDatabaseFeed.scala) >> Caused by: java.lang.RuntimeException: FATAL >> at scala.Predef$.error(Predef.scala:76) >> at scala.xml.parsing.MarkupParser$class.xToken(MarkupParser.scala:267) >> at net.liftweb.util.PCDataXmlParser.xToken(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:680) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:682) >> at net.liftweb.util.PCDataXmlParser.element1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:481) >> at net.liftweb.util.PCDataXmlParser.content1(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:505) >> at net.liftweb.util.PCDataXmlParser.content(PCDataMarkupParser.scala:91) >> at scala.xml.parsing.MarkupParser$class.document(MarkupParser.scala:207) >> at net.liftweb.util.PCDataXmlParser.document(PCDataMarkupParser.scala:91) >> at net.liftweb.util.PCDataXmlParser$.apply(PCDataMarkupParser.scala:112) >> at >> ca.ctrlspace.loveItHateItWeb.xml.UpcDatabaseFeed$.upcDatabase(UpcDatabaseFeed.scala:16) >> at >> ca.ctrlspace.loveItHateItWeb.xml.UpcDatabaseFeed$.<init>(UpcDatabaseFeed.scala:19) >> at >> ca.ctrlspace.loveItHateItWeb.xml.UpcDatabaseFeed$.<clinit>(UpcDatabaseFeed.scala) >> ... 1 more >> [/exception] >> >> What is the proper way to parse non strict html? I thought PCDataXMLParser >> allowed for non strict xml as opposed to XML.load(). >> >> Thanks, >> >> Chri >> >> >> > > > -- > Lift, the simply functional web framework http://liftweb.net > Beginning Scala http://www.apress.com/book/view/1430219890 > Follow me: http://twitter.com/dpp > Surf the harmonics > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Lift" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/liftweb?hl=en -~----------~----~----~----~------~----~------~--~---
