So many years ago, when we were an SGML-only shop, somebody somewhere decided it would be a Great Idea to stick SGML entities, e.g., ¨, into the "ascii" fields of our databases. For example, an author table might have a field fname_ascii to indicate a first name, and when one queries that one finds a mixture of what are indeed ASCII characters -- which happen to require running through an SGML/XML entity resolver to be usable!
Of course, the idea at the time was to write to browsers, and not worry about the contents, and so nobody bothered to construct a mapping of the entities they used, they just used HTML entities. Bah. So now it's our turn to clean up the mess. I was using mlsql, which was very neat. Naturally it treates the various fields of the DB as text and slurps up '¨' as '¨'. So I wanted to see if xdmp:tidy could deal with it. My first attempts at processing the entire sql:response were not productive, if I pass in an element() it translates ¨ into an NCR but strips all elements, if I pass in the result of xdmp:quote($sql_result) it leaves the ¨ declarations unmolested. So I ended up writing this, but I was wondering if anyone has done something similar in perhaps a more efficent manner? import module namespace sql = "http://xqdev.com/sql" at "/modules/mlsql/sql.xqy" declare namespace html="http://www.w3.org/1999/xhtml" (:~ : Run the text() nodes of an element through xdmp:tidy (useful for : translating escaped HTML entities into NCRs). : @param $input element to clean. : @return $input with any text() nodes passed via xdmp:tidy. :) define function tidyText($input as element()) as element() { element {node-name($input)} { for $node in $input/node() return if ($node instance of element()) then tidyText($node) else if ($node instance of text()) then normalize-space(xdmp:tidy($node)/html:html/html:body/text()) else $node } } So: tidyText(sql:execute($query, $mlsqlserver, ()) will return the sql:result with its text passed via tidy. Jim - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson [EMAIL PROTECTED] Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax) _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
