I've pretty much done what you suggested Teo. I'm iterating through the html string extracting the bits I want.
There is something I still wondered about though. I was hoping to be able to put the bits of html I extracted into an xml file because its easier to pick out data and saves parsing every level in the tree recursively. I concatenated the strings of html that i obtained into another string, added a couple of tags before and after to make it well formed and attempted loading it into a string using and XMLHttpRequest (using an example in David flanagan's book on JavaScript (O'Rielly publishing)). Any ideas, why it wouldn't be working? Perhaps the html I've extracted and put into an xml string is not wellformed? And do you think such an approach is advisable or would you recommend recursively parsing the entire html document myself. Judah On Aug 21, 9:34 am, Teo <[EMAIL PROTECTED]> wrote: > When i needed to do this, i made my own parser. Also depends on what you > want to do, but it shouldn't be hard. Basically you make a new String object > from the response text, like this: > > *var s=new String(responseText);* > > Then with a good old *for* you go through the string. > Here's a list of String properties and methods in > Javascript:http://www.w3schools.com/jS/js_obj_string.asp > > Thanks, > Teo > > > > On Thu, Aug 21, 2008 at 9:16 AM, Judah <[EMAIL PROTECTED]> wrote: > > > Thanks Teo > > It seems you're hunch is right. The various URLs that I've tried > > loading into xml (via the responseText property of the response) has > > malformed xml. > > The article you sent me helped me determine that. > > > I'm thinking now that my only option if I wish to grab a web page is > > to use regular expressions. Do you know of another way to grab data > > (html say) from webpages using the desktop? > > > Judah > > > On 19 אוגוסט, 09:44, Teo <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > here is an article about XML parsing; maybe it helps: > >http://code.google.com/apis/desktop/articles/2.html > > > > Are you sure the XML is not malformed? > > > > Thanks, > > > Teo > > > > On Mon, Aug 18, 2008 at 9:34 PM, Judah <[EMAIL PROTECTED]> wrote: > > > > > Hi there, > > > > > I'm trying to read an html page from the internet and extract data > > > > from it for my gadget. > > > > > I've opted to try parsing the page into xml but to no avail . > > > > > each time i do so I find the domDocument I receive has no child nodes. > > > > From the debugger I can see that the html string is there in its > > > > entirety. > > > > > Has anyone done this before. > > > > There was a post about parsing files to xml here > > > > >http://groups.google.com/group/Google-Desktop-Developer/browse_thread. > > .. > > > > > and although it should work for pages from the internet in the same > > > > way it doesn't seem to be. > > > > > If anyone can suggest how I might over come this or where I am going > > > > wrong I would be most grateful. > > > > > Simon > > > > > The code i've written (if it helps) is: > > > > > var URL = "http://www.tinyurl.com"; > > > > > var logoRequest_ = null; > > > > > function internetConnectionOpen() { > > > > // Start to download the page > > > > logoRequest_ = new XMLHttpRequest(); > > > > try { > > > > logoRequest_.open("GET", URL, true); > > > > } catch (e) { > > > > logoRequest_ = null; > > > > return; > > > > } > > > > > // Set the callback for when the downloading is completed (or > > > > failed) > > > > logoRequest_.onreadystatechange = onLogoData; > > > > > // Start the download > > > > try { > > > > logoRequest_.send(); > > > > } catch (e) { > > > > // Catch errors sending the request > > > > debug.info(e); > > > > logoRequest_ = null; > > > > return; > > > > } > > > > } > > > > > function onLogoData() { > > > > // Verify that the download completed > > > > if (logoRequest_.readyState != 4) > > > > return; > > > > > // Verify that the download was successful > > > > if (logoRequest_.status != 200) { > > > > logoRequest_ = null; > > > > return; > > > > } > > > > > var xmlDoc = new DOMDocument(); > > > > xmlDoc.loadXML(logoRequest_.responseText); > > > > webPageConsumerCallbackFunction(xmlDoc); > > > > (logoRequest_.responseText); > > > > > // Destroy the XMLHttpRequest object since it isn't being used > > > > anymore > > > > logoRequest_ = null; > > > > } > > > > > function webPageConsumerCallbackFunction(DOM) > > > > { > > > > debug.info("reached webPageConsumerCallbackFunction"); > > > > if (DOM == null) > > > > { > > > > alert("unable to connect to the website"); > > > > return; > > > > } > > > > else if (DOM.childNodes.length == 0) > > > > { > > > > alert("DOM is empty. unable to extract data"); > > > > return; > > > > } > > > > > //otherwise > > > > //digest the web page .... > > > > > } > > > > -- > > > Teo (a.k.a. Teodor Filimon, Teominator) > > > Site -www.teodorfilimon.com|Blog -www.teodorfilimon.blogspot.com > > > GMT +2 (or PDT +10)-הסתר טקסט מצוטט- > > > > -הראה טקסט מצוטט- > > -- > Teo (a.k.a. Teodor Filimon, Teominator) > Site -www.teodorfilimon.com| Blog -www.teodorfilimon.blogspot.com > GMT +2 (or PDT +10) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google Desktop Developer Group" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/Google-Desktop-Developer?hl=en -~----------~----~----~----~------~----~------~--~---
