Re: DOM Parser

Judah Thu, 28 Aug 2008 09:28:57 -0700

Thanks for the reply Teo.

Judah


P.S. I gather English is not your mother tongue. I've noticed that you
sign off your forums posts with 'Thanks, Teo'.
Allow me to give you a word of advice. When you sign off a forum post
or email in English you don't thank the person it is meant for.
You can say many things. You could say:
'Good luck'/ 'Best of luck',
'Hope this helps, let me know if you need further assistance'
etc.
You obviously also have more formally signatures but they aren't
appropriate for a forum post which are by their very nature informal
At any rate, when you sign off an post I recommend altering the
signature.

On Aug 28, 4:07 pm, Teo <[EMAIL PROTECTED]> wrote:
> Maybe there are certain characters like escape
> characters<http://www.html-reference.com/Escape.htm>present in the
> actual data. You would have to use the encoded form if such a
> character is there, maybe this is your problem..
>
> About what you want to do: if you're already directly parsing the HTML it
> may be a bit overkill to manually create the XML string, if i understand
> correctly. However, if, as you said, you need some special data structures
> and this whole fetching and parsing process isn't called very often (thus
> consuming a lot of CPU), i think you could invest some code in the HTML ->
> XML transformation (but be careful with those excape characters if you do it
> :).
>
> Thanks,
> Teo
>
>
>
> On Thu, Aug 28, 2008 at 1:39 PM, Judah <[EMAIL PROTECTED]> wrote:
>
> > I've pretty much done what you suggested Teo.
> > I'm iterating through the html string extracting the bits I want.
>
> > There is something I still wondered about though.
> > I was hoping to be able to put the bits of html I extracted into an
> > xml file because its easier to pick out data and saves parsing every
> > level in the tree recursively. I concatenated the strings of html that
> > i obtained into another string, added a couple of tags before and
> > after to make it well formed and attempted loading it into a string
> > using and XMLHttpRequest (using an example in David flanagan's book on
> > JavaScript (O'Rielly publishing)).
> > Any ideas, why it wouldn't be working?
> > Perhaps the html I've extracted and put into an xml string is not
> > wellformed?
> > And do you think such an approach is advisable or would you recommend
> > recursively parsing the entire html document myself.
>
> > Judah
>
> > On Aug 21, 9:34 am, Teo <[EMAIL PROTECTED]> wrote:
> > > When i needed to do this, i made my own parser. Also depends on what you
> > > want to do, but it shouldn't be hard. Basically you make a new String
> > object
> > > from the response text, like this:
>
> > > *var s=new String(responseText);*
>
> > > Then with a good old *for* you go through the string.
> > > Here's a list of String properties and methods in Javascript:
> >http://www.w3schools.com/jS/js_obj_string.asp
>
> > > Thanks,
> > > Teo
>
> > > On Thu, Aug 21, 2008 at 9:16 AM, Judah <[EMAIL PROTECTED]> wrote:
>
> > > > Thanks Teo
> > > > It seems you're hunch is right. The various URLs that I've tried
> > > > loading into xml (via the responseText property of the response) has
> > > > malformed xml.
> > > > The article you sent me helped me determine that.
>
> > > > I'm thinking now that my only option if I wish to grab a web page is
> > > > to use regular expressions. Do you know of another way to grab data
> > > > (html say) from webpages using the desktop?
>
> > > > Judah
>
> > > > On 19 אוגוסט, 09:44, Teo <[EMAIL PROTECTED]> wrote:
> > > > > Hi,
>
> > > > > here is an article about XML parsing; maybe it helps:
> > > >http://code.google.com/apis/desktop/articles/2.html
>
> > > > > Are you sure the XML is not malformed?
>
> > > > > Thanks,
> > > > > Teo
>
> > > > > On Mon, Aug 18, 2008 at 9:34 PM, Judah <[EMAIL PROTECTED]> wrote:
>
> > > > > > Hi there,
>
> > > > > > I'm trying to read an html page from the internet and extract data
> > > > > > from it for my gadget.
>
> > > > > > I've opted to try parsing the page into xml but to no avail .
>
> > > > > > each time i do so I find the domDocument I receive has no child
> > nodes.
> > > > > > From the debugger I can see that the html string is there in its
> > > > > > entirety.
>
> > > > > > Has anyone done this before.
> > > > > > There was a post about parsing files to xml here
>
> >http://groups.google.com/group/Google-Desktop-Developer/browse_thread.
> > > > ..
>
> > > > > > and although it should work for pages from the internet in the same
> > > > > > way it doesn't seem to be.
>
> > > > > > If anyone can suggest how I might over come this or where I am
> > going
> > > > > > wrong I would be most grateful.
>
> > > > > > Simon
>
> > > > > > The code i've written (if it helps) is:
>
> > > > > > var URL = "http://www.tinyurl.com";;
>
> > > > > > var logoRequest_ = null;
>
> > > > > > function internetConnectionOpen() {
> > > > > >  // Start to download the page
> > > > > >  logoRequest_ = new XMLHttpRequest();
> > > > > >  try {
> > > > > >    logoRequest_.open("GET", URL, true);
> > > > > >  } catch (e) {
> > > > > >   logoRequest_ = null;
> > > > > >    return;
> > > > > >  }
>
> > > > > >  // Set the callback for when the downloading is completed (or
> > > > > > failed)
> > > > > >  logoRequest_.onreadystatechange = onLogoData;
>
> > > > > >  // Start the download
> > > > > >  try {
> > > > > >    logoRequest_.send();
> > > > > >  } catch (e) {
> > > > > >    // Catch errors sending the request
> > > > > > debug.info(e);
> > > > > >   logoRequest_ = null;
> > > > > >    return;
> > > > > >  }
> > > > > > }
>
> > > > > > function onLogoData() {
> > > > > >  // Verify that the download completed
> > > > > >  if (logoRequest_.readyState != 4)
> > > > > >    return;
>
> > > > > >  // Verify that the download was successful
> > > > > >  if (logoRequest_.status != 200) {
> > > > > >   logoRequest_ = null;
> > > > > >    return;
> > > > > >  }
>
> > > > > >        var xmlDoc = new DOMDocument();
> > > > > >        xmlDoc.loadXML(logoRequest_.responseText);
> > > > > >  webPageConsumerCallbackFunction(xmlDoc);
> > > > > > (logoRequest_.responseText);
>
> > > > > >  // Destroy the XMLHttpRequest object since it isn't being used
> > > > > > anymore
> > > > > >  logoRequest_ = null;
> > > > > > }
>
> > > > > > function webPageConsumerCallbackFunction(DOM)
> > > > > > {
> > > > > >        debug.info("reached webPageConsumerCallbackFunction");
> > > > > >        if (DOM == null)
> > > > > >        {
> > > > > >                alert("unable to connect to the website");
> > > > > >                return;
> > > > > >        }
> > > > > >        else if (DOM.childNodes.length == 0)
> > > > > >        {
> > > > > >                alert("DOM is empty. unable to extract data");
> > > > > >                return;
> > > > > >        }
>
> > > > > > //otherwise
> > > > > > //digest the web page ....
>
> > > > > > }
>
> > > > > --
> > > > > Teo (a.k.a. Teodor Filimon, Teominator)
> > > > > Site -www.teodorfilimon.com|Blog -www.teodorfilimon.blogspot.com
> > > > > GMT +2 (or PDT +10)-הסתר טקסט מצוטט-
>
> > > > > -הראה טקסט מצוטט-
>
> > > --
> > > Teo (a.k.a. Teodor Filimon, Teominator)
> > > Site -www.teodorfilimon.com|Blog -www.teodorfilimon.blogspot.com
> > > GMT +2 (or PDT +10)
>
> --
> Teo (a.k.a. Teodor Filimon, Teominator)
> Site -www.teodorfilimon.com| Blog -www.teodorfilimon.blogspot.com
> GMT +2 (or PDT +10)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Desktop Developer Group" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/Google-Desktop-Developer?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: DOM Parser

Reply via email to