Hi Gary,

word boundaries are nothing but sugar to regex expressions for engines 
supporting lookahead and -behind. They're defined by [1] as all positions

> - Before the first character in the string, if the first character is a word 
> character.
> - After the last character in the string, if the last character is a word 
> character.
> - Between two characters in the string, where one is a word character and the 
> other is not a word character.

This can easily be written as

    ((?<=\w)(?!\w)|(?<!\w)(?=\w))

which actually describes the third rule, but `$` and `^` are 
"non-word-characters" anyway.

Using non-XQuery-functions (as calling Java from XQuery) will prevent future 
(hopefully soon) performance optimizations regarding parallel execution, better 
stick to the XQuery's default regex whenever possible.

Kind regards from Lake Constance, Germany,
Jens Erat

[1]: http://www.regular-expressions.info/wordboundaries.html

-- 
Jens Erat

 [phone]: tel:+49-151-56961126
  [mail]: mailto:em...@jenserat.de
[jabber]: xmpp:jab...@jenserat.de
   [web]: http://www.jenserat.de

     PGP: 350E D9B6 9ADC 2DED F5F2  8549 CBC2 613C D745 722B

 

Am 21.10.2012 um 19:35 schrieb The Trainspotter <wy...@btinternet.com>:

> Hi Christian,
> 
> The regular expression capability I was missing was the word boundary \b 
> matching. I followed the Java bindings example so I can now use the Java 
> String.matches() function which allows me to use the \b match (and others 
> too) which are not part of the standard regex capability. This performs very 
> well, so I think you can hold off adding another extension.
> 
> Cheers,
> Gary
> 
> From: Christian Grün <christian.gr...@gmail.com>
> To: The Trainspotter <wy...@btinternet.com> 
> Cc: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de> 
> Sent: Sunday, 21 October 2012, 18:23
> Subject: Re: [basex-talk] Using full Java regular expressions
> 
> Hi Gary,
> 
> BaseX provides the full XQuery 3.0 regular expression syntax [1,2];
> maybe it already contains the features you need for your queries? If
> not, could you give us a hint which ones you are missing?
> 
> While we could add an additional flag to the regex evaluator in BaseX,
> we are generally hesitant to do so, because it would be yet another
> vendor (i.e., Saxon and BaseX)-specific extension.
> 
> Best,
> Christian
> 
> [1] http://www.w3.org/TR/xpath-functions-30/#regex-syntax
> [2] http://www.w3.org/TR/xmlschema-2/#regexs
> ___________________________
> 
> > I'm currently converting my project to use BaseX instead of Saxon. One thing
> > you can do in Saxon is provide a flag (an exclamation mark) to your regular
> > expression to tell the matches function to use the Java regular expression
> > processor, rather than the rather cut down expressions available in the
> > XQuery spec.
> >
> > Is there anything similar in BaseX?
> >
> > If not what do you recommend to define a Java regular expression based
> > function for XQuery?
> >
> > Thanks in advance,
> > Gary
> >
> > _______________________________________________
> > BaseX-Talk mailing list
> > BaseX-Talk@mailman.uni-konstanz.de
> > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
> >
> 
> 
> _______________________________________________
> BaseX-Talk mailing list
> BaseX-Talk@mailman.uni-konstanz.de
> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to