Hi Tim, If I were in your shoes, my plan of attack would be to hide as many of the input "rules" as you can from your users and do the conversion for them. It certainly won't be trivial, but the two examples you described could both be solved by the correct pre-processing before being inserted into XML.
My recommendation to you would be to do this processing on the back end rather than the front end as you can never truly guarantee what will be sent to your back end. If you want to throw some PHP code into the mix, PHP has a method to solve this: http://www.w3schools.com/php/func_string_htmlentities.asp Searching around I found an equivalent function for Java: http://www.dalesandro.net/java-string-to-html-entities-encoder/ I was also even able to find a recreation of this function in Javascript for you: http://phpjs.org/functions/htmlentities/ But, unless you have Javascript on your back end, you really should make sure to do back end validation as well. As for pulling data out in the correct format, I know web browsers and most XML readers will do the correct decoding conversion for you. Best, Rob Rob Szkutak Associate Consultant MarkLogic Corporation [email protected] Cell +1.716.562.8464 www.marklogic.com<http://www.marklogic.com> ________________________________ From: [email protected] [[email protected]] on behalf of Tim [[email protected]] Sent: Tuesday, October 07, 2014 3:23 AM To: 'MarkLogic Developer Discussion' Subject: [MarkLogic Dev General] Handling HTML entry of encoded characters for entry into XML Hi Folks, I am creating an HTML entry form for inputting text that can extend beyond the ASCII range, so the trick is standardizing the input of entities, and of course what to do with the ampersand character. There are 2 parts to this challenge: 1. Creating the text entry UI and providing rules for inputting entities as well as detecting and reporting invalid entries, and 2. Converting the inputted entities into their corresponding UTF-8 value for storage in MarkLogic, especially so that the exported values can be converted back into the appropriate entities for html display or for export such as to a Microsoft Word document. It seems that I cannot have my cake and eat it too, for example if I want to allow the user to simply insert a title with an ampersand they could enter: Red & White But if I want to allow them to enter other encoded values such as: “ Red & White” Then there needs to be the expectation that entering and ampersand by itself is disallowed, that the former must be supplied as Red & White So how do folks tend to deal with this issue for each of the parts that I describe above? Thanks for any help with this. It seems like a simple issue but that has a lot of complexity, especially when folks allow proprietary named and numbered html encodings with private use area Unicode mapping. Is this the bane of UI entry for XML UTF-8 mapping or what? :) Tim M.
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
