Bruno Dumon wrote:
...

* different users of the widget (like the doco project vs the project
where we need it) will likely require different subsets of HTML to be
used.

* support for both Mozilla and IE is important. Other browsers should
fall back to a textarea with raw HTML in it.

* the HTML produced by the editor should be cleaned (i.e. not supported
tags & attributes removed) and normalized (formatted). The goal of this
is to deliver a nice XHTML-subset-doc for storage, and to show nice HTML
to people editing it manually. Hopefully this will also make it possible
to do meaningful text-based diffs.



I have done some work on this. I have first written a js html editor for IE (>5.5) to be used in an XML content management system. For this we needed to clean the html and convert it to xhtml in order to be able to process it with xslt upon displaying pages.

One approach that I've tried is to generate the xhtml from the browser dom page with javascript, i.e. walk the tree and recursively generate <TAG> ... </TAG> entries, while surrounding all attributes with quotes. This could then be postprocessed on the server by parsing it with an XML parser and manipulating the DOM tree. This however proved to be a slight nightmare due to js/dom bugs in IE 5.5, if you'd be willing to drop 5.5 support it would be easier, but it might also be possible to do this using more specific IE js constructions with which I'm not particular familiar.

Eventually we ended up doing this completely server side, I wrote one component to fix the html to be xhtml and after that I use an XML parser to remove all unwanted attributes and tags.

The biggest problem while handling the html is that you also have to parse Word html that is pasted into the editor, and the html that Word produces is truly gruesome!

While the server side solution works well for all html garbage that I have encountered until now, it is not completely satisfactory because when you paste the html into the editor you're looking at the unprocessed html, when it has been processed by the server a lot will have been removed and it can look rather different. One could try to explain this to the user, but it's better to filter the html directly after pasting it, so the user will not get confused.

I'm now in the process of writing an editor component that can handle IE and Mozilla. It is in a working state, but the code needs to cleaned and some stuff needs to be written (a table editor, a url editor, etc.), it is however for a closed source system. I could discuss it to see if we would be willing to release it as open source.

My first thought was to do this cleanup stuff serverside (could be as
simple as an XSL, which would make it easily customisable too). However
it seems like you want to do all that on the client side?



This won't work, you need valid xml to use xsl, and the IE html in particular can be very troublesome to fix.

* Currently in e.g. Linotype the source for the editor (thus of the
iframe) is fetched separately from the main page. This is harder to do
with cforms since then the pipeline from which the content is fetched
should also have access to the cforms Form which is stored somewhere in
a variable in a flowscript. For the cforms widget it would be easier I
think to embed the HTML directly in the page (e.g. as a Javascript
variable). This also makes it possible to assign the content either to
the html editor or the textarea depending on what the client supports.

* Automatic image upload: still need to think more about this. After
pressing the submit button (and afterwards possibly showing the form
again), the images will need to become available in the URL space. How
that's done will probably differ from application to application so we
could put that behaviour behind an interface.




This is an interesting problem, Stefano talked about embedding it into the document, how would you want to do this? That would be the best solution for an embeddable component!


* wiki syntax support: we have no need for this, so don't expect any
effort from me on that.




Regards, Marc.



Reply via email to