Hello --

So as part of building tests, I'm regularizing the text contents of some
Word documents into single strings.  (Which makes it relatively easy to
make sure no words have gotten lost or changed order when compared to other
stages of the process.)

Regularization is a tactful way to put this particular atrocity:

let $stringTidy as function(xs:string+) as xs:string := function($in as
xs:string+) as xs:string {$in  => string-join(' ') =>
replace(xquery:eval($menuMatch),'') => replace('
',' ') =>
replace('	',' ') => replace('
',' ') =>
replace('\p{Zs}',' ') => replace(' +',' ') =>
replace(' ([,\.;:])','$1') => replace('^ ','') =>
replace(' $','')}

$menuMatch gets stripped out of the Word because it's added by processing,
rather than being present in the source file which generates the other half
of the compare.  (It's currently U+1405, ᐅ, though I devoutly hope this
doesn't matter!)  It gets read from an XSL source document, which I've
included in minimal form, along with some sample data and a minimal-ish
query.

If I use $menuMatch in the replace, it doesn't work, in the sense that the
ᐅ character is NOT removed from the string.  If I xquery:eval() it, as
here, the replace does work to remove the ᐅ from the string.  I don't
expect to need xquery:eval to use a variable as the second argument of
replace().  Am I wrong?  Has the pile of arrow operators exceeded the
bounds of reason?

Thanks!
Graydon

<<attachment: basex-test.zip>>

Reply via email to