Hi Tim, Maybe simplistic, but you could do it with a map:map and analyze-string. Search for entities on the way in and look up the names. Search for unicode char ranges that you want to convert on the way out, and look them up in the inversion of the map:map. Something like
xquery version "1.0-ml"; declare namespace s = "http://www.w3.org/2005/xpath-functions"; declare function local:replace-hits-from-map ($in, $regex, $map) { fn:string-join (( let $checked := fn:analyze-string ($in, $regex) for $bit in $checked/* let $text := $bit/text() return if ($bit/self::s:non-match) then $text else map:get ($map, $text) ), '') }; let $conf := map:entry ('&rarrow;', '→') let $in := 'foo &rarrow; bar' let $imported := local:replace-hits-from-map ($in, '&[^;]+;', $conf) let $exported := local:replace-hits-from-map ($in, '[ÿ-]', -$conf) return ($in, $imported, $exported) You could serialize the map somewhere and deserialize it when needed. Of course it could get huge depending on how much of Unicode you need. When I had this it was just a case of people adding things as they needed them, and translating short strings. So I was OK just having obvious error checking and putting in the new mappings when they arose. - Chris _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
