Brett Clippingdale wrote:
On Fri, 2005-03-04 at 17:36 -0800, Heikki Toivonen wrote:
I thought about this a little bit, and I think by far the easiest way
would be to hook into the webserver parcel in Chandler. Not the sexiest
Your solution is an interesting one, to be certain. The mechanics of
the toolbar are not difficult; the interesting problem here is
connecting data to the repository. You may well know that, in addition
to using forms, JavaScript/XUL can do a POST to a server with it's
XMLHttpRequest(), which is the hack used for Google's autosuggest, maps,
Yes, XMLHttpRequest was one of my main areas of responsibility at
Netscape from NS 6 to 7.1.
IIRC, you want to send both the URL and the page contents to the
repository, and I'm thankful for the guidance on how to connect it to
the repository. I'll look into that, and will contact Morgen.
I think you may want to send even more - the HTTP headers. But one step
at a time: URL, URL+data, URL+data+headers. I think this is also in the
order of difficulty.
The URL is trivial.
What I don't know is how to handle the DOM, which is what I presume we
want to store, in addition to the URL string. (Is this correct? That's
a *lot* of data!) XMLHttpRequest() has a send() command where the real
data will be sent, and one can send a string, the entire DOM, or a DOM
fragment. In any case, AFAIK the object will be/must be serialized
using Mozilla's XPCOM serializer, nsDOMSerializer.
Does Python have a library that will de-serialize these objects? For
instance, I'm not sure if Mozilla's is the standard W3C DOM "Load and
Save Level 3," which I presume Python can handle, or if there is any
relation to Apache's asDOMSerializer interface, which is also used with
Java. Better yet, does the webserver parcel have an interface that can
handle this stuff?
I'm also concerned about how non-W3C-compliant web pages will be passed.
I hope someone more experienced with these issues will know, otherwise
I'll continue to research it.
The DOM serializer is most likely the piece you'd need to use. Serialize
to string first, then send the string with XMLHttpRequest in POST
operation. (send() can take DOM document as well, but it will use XML
serializer which would be bad in most cases since we'd be dealing with
HTML -
http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#1429.)
This will mean that the data you send is not exactly the data Mozilla
received over the wire, but I think that is ok. You see, Mozilla does a
lot of work to understand the tag soup out there that is sometimes
called HTML, and will build a DOM it thinks the authors meant. Typically
only the markup will change to well formed, in some extremely rare cases
with horrible markup some actual text may be lost. So, you'd be sending
the data as it was understood by Mozilla, which I think is fine - that's
what you see in the browser anyway. The added benefit is that on
Chandler side you will be dealing with well-formed HTML, in case you
want to parse it (although I guess I would just store it as a text
object and let Chandler index it as is).
Regarding headers, I think the best thing would be to take a look at the
Live HTTP Headers extension (http://livehttpheaders.mozdev.org/).
--
Heikki Toivonen
signature.asc
Description: OpenPGP digital signature
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev