Hi Tim,

Maybe simplistic, but you could do it with a map:map and
analyze-string.  Search for entities on the way in and look up the
names.  Search for unicode char ranges that you want to convert on the
way out, and look them up in the inversion of the map:map.  Something
like

xquery version "1.0-ml";

declare namespace s = "http://www.w3.org/2005/xpath-functions";;

declare function local:replace-hits-from-map ($in, $regex, $map) {
      fn:string-join ((
          let $checked := fn:analyze-string ($in, $regex)
          for $bit in $checked/*
          let $text := $bit/text()
          return
              if ($bit/self::s:non-match) then $text else map:get ($map, $text)
      ), '')
};

let $conf := map:entry ('&rarrow;', '→')
let $in := 'foo &rarrow; bar'
let $imported := local:replace-hits-from-map ($in, '&[^;]+;', $conf)
let $exported := local:replace-hits-from-map ($in,
'[ÿ-]', -$conf)
return ($in, $imported, $exported)

You could serialize the map somewhere and deserialize it when needed.

Of course it could get huge depending on how much of Unicode you need.
When I had this it was just a case of people adding things as they
needed them, and translating short strings. So I was OK just having
obvious error checking and putting in the new mappings when they
arose.

- Chris
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to