Hi Santhosh,

A way different approach could be using cts:tokenize. It splits the string
into a sequence of typed values. String are returned as cts:word,
punctuation as cts:punctionation, etc. I used it to split text into words,
and apply some normalization on the tokens:

for $word in
cts:tokenize(fn:lower-case(fn:replace(fn:normalize-unicode($text, 'NFD'),
'[\p{M}]', '')))[. instance of cts:word] return cts:stem($word)

Note: the normalize-unicode/replace trick removed diacritics..

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: general-boun...@developer.marklogic.com [mailto:general-
> boun...@developer.marklogic.com] Namens Jakob Fix
> Verzonden: donderdag 10 mei 2012 16:23
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Need to Remove spaces,
> punctuations, parens and ect., from the given string (need to remove all
> character other than A-Z and 0-9)
>
> Hi,
>
> it's probably easier if you declare the character groups you want to
> keep and excluding everything else, like so:
>
> let $string := "AB cd/EF;gh"
> return replace($string, '[^a-zA-Z]+', '') (: everything that's not an
> alphabetical character will be replaced :)
> ==> "ABcdEFgh"
>
> cheers,
> Jakob.
>
>
> On Thu, May 10, 2012 at 4:00 PM, Rajasekaran, Santhosh
> <santhosh.rajaseka...@hmhpub.com> wrote:
> > Hi Folks,
> >
> >
> >
> >                 I have the below requirement in Xquery.
> >
> >
> >
> > Given a string I need to remove spaces, punctuation, parens and etc.,
> > (I.e)except alpha(A-Z or a-z) and numeric 0-9
> >
> >
> >
> > Eg:
> >
> >
> >
> > Input                                       Expected Output
> >
> >
> >
> > San & co.,                              Sanco
> >
> > It is a string                            Itisastring
> >
> > New (value)                          Newvalue
> >
> > At,the hill +  school             Atthehillschool
> >
> > Oh!.. is it, I don’t know       OhisitIdontknow
> >
> >
> >
> > Please let me know how do I achieve this. Do I need to add all this
> > characters (spaces,punctuation,parens and etc., in regular expression
and
> > replace that one by one) using fn:replace() function.
> >
> > Or
> >
> > Do we have any other better suggestion?
> >
> >
> >
> > Thanks & Regards,
> >
> > Santhosh
> >
> >
> > _______________________________________________
> > General mailing list
> > General@developer.marklogic.com
> > http://developer.marklogic.com/mailman/listinfo/general
> >
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to