Allowing '\B' as a boundary match is an XQuery/XPath syntax extension,
no? The query below ought to throw an error when the XQuery version is
declared as strict 1.0, but it doesn't. (Are the regex extensions
documented somewhere?)
David
On Fri, 13 Feb 2009, Aaron Redalen wrote:
> Eric, that's the correct behavior. Your pattern matches the empty
> string anywhere, including at the start of your string. You can get
> the behavior you're looking for by matching empty strings not at the
> beginning or end of a word, using the expression '\B'.
>
> declare variable $alphabet := tokenize('abcdefghijklmnopqrstuvwxyz', '\B');
>
> for $x in $alphabet
> return concat('[', $x, ']')
>
> => [a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [k] [l] [m] [n] [o] [p] [q] [r]
> [s] [t] [u] [v] [w] [x] [y] [z]
>
>
> Aaron Redalen
> Senior Consultant, Federal
> Mark Logic Corporation
> +1 240 688 7433 Phone
> [email protected]
> www.marklogic.com
>
> Don't miss the XML event of the year! Join us for the Mark Logic User
> Conference, May 12-14, in beautiful San Francisco. Hear from keynote speakers
> James Surowiecki, best-selling author of "The Wisdom of Crowds" and Whit
> Andrews, top analyst from Gartner. REGISTER NOW for the early bird rate of
> $395.
> Attend the conference at no charge as a speaker! Submit a proposal for a
> breakout session on business applications, technical implementation, or best
> practices. Deadline is February 13th.
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Eric Palmitesta
> Sent: Friday, February 13, 2009 3:05 PM
> To: ML Developer Mailing List
> Subject: [MarkLogic Dev General] tokenize returns empty string on "" pattern
>
> declare variable $alphabet := tokenize('abcdefghijklmnopqrstuvwxyz', '');
>
> for $x in $alphabet
> return concat('[', $x, ']')
>
> <v:results v:warning="more than one node">
> [] [a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [k] [l] [m] [n] [o] [p] [q]
> [r] [s] [t] [u] [v] [w] [x] [y] [z]
> </v:results>
>
> Why am I getting the empty string at the beginning of the sequence
> returned from the above call to tokenize? Is this result expected?
>
> Eric
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [email protected] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general