I just replied to this and cc'ed mozilla-editor, since it
concerns the editor code for html copy/paste.
Here's the original for those of you who didn't see it on the
Unix newsgroups.
...Akkana
----- Forwarded message from Toastie <[EMAIL PROTECTED]> -----
From: Toastie <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Date: Sun, 23 Sep 2001 01:48:35 +0300
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010920
Subject: HTML in Mozilla for X11 clipboard
Hi,
I'm using both Mozilla and Konqueror for my daily browsing on Linux, and
have recently began looking into possible interoperatibily between Linux
desktop applications. As a part of my research, I began looking at the
information various applications export to the clipboard. Mozilla
typically offers HTML contents in the clipboard, which could be
succesfully used in rich text editors (HTML or not) as well in other
office applications (e.g. pasting HTML tables into a spreadsheet).
Currently, Mozilla exports the following "data formats" for a selection
from a web page (in UTF16 encoding):
text/html - with the contents of the selection
text/_moz_htmlcontext - with surrounding of the selection (the need for
it is unclear to me)
text/_moz_htmlinfo - with some two integers who always seem to be 0,0
(I'd be glad to hear an explanation of those values)
Since then, I implemented exporting of "text/html" format in KTHML
(Konqueror's HTML engine) and noticed Mozilla's implementation got a
thing which my simple HTML conversion of a DOM Range won't do;
a selection such as "<b>one [two] three</b>" (where "two" is selected)
in Mozilla would result in the clipboard containing "<b>two</b>" instead
of just "two". That's a nice feature, especially for word processors
(e.g. Composer) where the person expects to copy the text along with the
format.
I assume Mozilla implements this by keeping a list of formatting HTML
tags (<b>, <i>, <font color=...> etc.) and traversing the HTML tree down
from the selection, collecting those tags on the way.
I want to extend that feature. Instead of adding surrounding formatting
tags to the HTML put on the clipboard, I propose adding a clipboard
format called "text/css", which'll contain a getComputedStyle dump of
the first textNode of the selections' DOM range. That way, we could pass
on the actual style of the text, however complicated it might be.
Since Composer uses HTML formatting tags in it's HTML, we'll extend
Pasting in Composer to convert the CSS to <B>, <I> etc. as much as
possible - and the rest, as a <SPAN> tag. This way Composer will gain
the ability to maintain any given style of text from the source page.
There are certain things I'm still wondering about though:
1. Changing of the clipboard encoding. That is discussed in bug 44496.
Since declaring formats in the X clipboard doesn't imply actual transfer
(or even generation) of the data, we can easily offer multiple encodings
without a performance penalty. UTF-16 makes interoperatibility a bit
complicated, since it's contains NULLs and is not the usual encoding
you'd expect to find on a clipboard. I propose that the clipboard would
contain an:
a) "text/html;charset=UTF-8" type in UTF-8 encoding.
b) "text/html" type with all non-Latin1 values encoded as Unicode
entities (&#xNUMBER;). This won't be the prefered format (since it would
end up much larger than UTF-8), but would be the most non-ambiguous format.
If we still intend to keep UTF-16 on the clipboard, lets in order to
disambiguate it, call it "text/html;charset=ISO-10646-UCS-2" or
"text/html;charset=UTF-16" and start it with the endianess bytes (FF FE).
2. Base URI for objects. For objects embedded on the page (<IMG>,
<OBJECT>, <EMBED> etc.) with a relative URI, would we:
a) change their SRC to contain the absolute URI?
b) keep the HTML data in the clipboard as a complete HTML document with
a DOCTYPE and a <BASE HREF="..."> tag?
3. An additional idea I had is a "text/html;version=3.0" type, which'll
contain the closest possible approximation of the style by appending
HTML formatting tags just like current Mozilla does. This format will be
mostly for useful for things like GTKHTML, which have a limited HTML
support, obviously without CSS.
On my side, I'll try to implement this into Mozilla (unless any of you
more familiar with the beast will step forward) and I'll make sure KHTML
works similarily. People in the KOffice team are also interested in
this. I'll try to promote this for inclusion in GTKHTML (Ximian
Evolution's HTML viewer / editor) as well.
Looking forward for your comments and suggestions.
----- End forwarded message -----