Hi again, <warning> this is a long post </warning>
I'm still working on HTML forms where the user (me for the moment:) is supposed to input HTML into a text area that will be stored in an XML format. I'm still having problems, so I haven't written a SUMMARY post... My new problem occurred last night when I'm testing the system and I put in an anchor tag with a url that has request parameters... like this: <a href="http://www.something.net/apage.jsp?p1=hi&p2=bye">link</a> Well, when I hit submit the form is supposed to come back filled out, but instead I get an error that states "the entity 'p2' must end with a ';'. So I do some searching on on w3.org and sure enough URLs in XHTML have to use '&' instead of '&'. Arrgh, I know this will cause problems once people who are used to normal HTML start using this. I'm considering writing a filter that will escape illegal characters on the way in, and un-escape them going back to the user, but that seems like a bit of a pain and combined with the problems I'm having making people type XML compliant HTML in the first place I'm wondering if there's a completely different way I could do this. I'm sure someone else out there has come across these problems before. It seems inevitable when building a webapp where users can edit some content, that uses XML on the backend. The users only marginally know HTML in the first place and can't be expected to always follow the rules correctly every time. The app after all, is supposed to be easy to use. I would love to start some discussion on different ideas for handling these types of problems. They must be common among Cocoon users, and maybe we can come up with a set of solutions (HOW-TO's, Java helper classes, taglibs) to make life easier on Cocoon developers and end-users. Here's my little list of requirements, issues, and assumptions when dealing with forms, user input, and xml. 1) My users are used to HTML, not XML 2) My users are not fail proof, and are probably prone to occasional mistakes 3) Ideally I want them to be able to input HTML(non XML compliant), plain text, or XML (not HTML, but any XML. this is actually preferred, but sometimes users are just entering a news item, or a BBS post, and it seems reasonable to allow them to use HTML for formatting rather than inventing my own xml dialect) 4) The data is going to be in an XML document/SAX stream at some point (either stored that way, or stored in a database and turned into xml through a generator) 5) sometimes I want to run xsl transformations on the data when it is output. 6) when editing the data, I'd like to have it appear exactly as the user typed 7) but i'd also like to have the ability to clean it up (as on option) 8) The browsers like HTML 4 much better than XHTML, therefore the pages I send them work better if I use the HTMLSerializer Here are some problems I've encountered so far. 1) users don't follow XML rules very well (goes along with point 1) 2) the HTMLSerializer changes the users data by turning <br/> into <br>, etc 3) the XML Serializer changes the users data by turning <textarea></textarea> into <textarea/>, etc 4) bad user input will cause SAXExceptions if it's not enclosed in CDATA sections (oh, to clarify here, I typically have two pages which show the data, one is the 'edit' page with the form, the other is where the data actually shows up, the 'viewing page', the HTMLserializer is no problem on the viewing page, just the editing page) Some of these points interfere with some solutions. For example, I could wrap the data in a CDATA section to get around XML compliance, but then I wouldn't be able to run XSL transformations on it (correct me if I'm wrong anywhere). Maybe I could check if the data is xml compliant and wrap it only if it isn't. Here are some ideas for solutions: 1) Create a new HTMLSerializer that can selectively determine which tags it will convert into HTML and which is will leave alone. This way you could specify that all textarea tags and their contents shouldn't be touched (I would think this would be a reasonable default feature anyway) 2) Create a jTidy like program that will turn HTML into XHTML, but work for fragments (jTidy seems to only output complete HTML documents) 3) Create a class that can find an XML error, and report it nicely back to the user so they can fix it. (I recall a demo with Cocoon 1.8.x that had something like this...) Hmm, these three things might do it. the new serializer would work for editing, the Tidy-like class work work for either storing the data as xml, or just viewing it as xml. I think I have an idea on how to do the serializer, but it wouldn't rely on a transformer like the current one. I looked at the code for jTidy and there's a ton of classes, so I've yet to fully comprehend how it works, it might already be able to do what i want, and like I said I saw something similar to 3) a year or so ago... ok, that's my thoughts... Justin --------------------------------------------------------------------- Please check that your question has not already been answered in the FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html> To unsubscribe, e-mail: <[EMAIL PROTECTED]> For additional commands, e-mail: <[EMAIL PROTECTED]>
