So many years ago, when we were an SGML-only shop, somebody somewhere
decided it would be a Great Idea to stick SGML entities, e.g., ¨,
into the "ascii" fields of our databases. For example, an author table
might have a field fname_ascii to indicate a first name, and when one
queries that one finds a mixture of what are indeed ASCII characters --
which happen to require running through an SGML/XML entity resolver to
be usable!

Of course, the idea at the time was to write to browsers, and not worry
about the contents, and so nobody bothered to construct a mapping of
the entities they used, they just used HTML entities. Bah.

So now it's our turn to clean up the mess. I was using mlsql, which was
very neat.  Naturally it treates the various fields of the DB as text
and slurps up '¨' as '¨'.

So I wanted to see if xdmp:tidy could deal with it.  My first attempts
at processing the entire sql:response were not productive, if I pass in
an element() it translates ¨ into an NCR but strips all elements,
if I pass in the result of xdmp:quote($sql_result) it leaves the ¨
declarations unmolested.

So I ended up writing this, but I was wondering if anyone has done
something similar in perhaps a more efficent manner?

import module namespace sql = "http://xqdev.com/sql";
  at "/modules/mlsql/sql.xqy"

declare namespace html="http://www.w3.org/1999/xhtml";

(:~
 : Run the text() nodes of an element through xdmp:tidy (useful for
 : translating escaped HTML entities into NCRs).
 : @param  $input element to clean.
 : @return $input with any text() nodes passed via xdmp:tidy.
 :)
define function tidyText($input as element())
as element()
{
  element {node-name($input)} {
    for $node in $input/node()
    return
      if ($node instance of element())
      then tidyText($node)
      else if ($node instance of text())
      then normalize-space(xdmp:tidy($node)/html:html/html:body/text())
      else $node
  }
}

So:

  tidyText(sql:execute($query, $mlsqlserver, ())

will return the sql:result with its text passed via tidy.

Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       [EMAIL PROTECTED]
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to