A quick test with XQuery , fn:replace doesn’t show the problem 

fn:replace('2004XY2006', '(\d\d\d\d)xy(\d\d\d\d)', "$1 and $2" , "i" )
->  "2004 and 2006"

So this looks xslt specific

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of David Sewell
Sent: Monday, September 08, 2014 9:20 AM
To: "neil bradley"; MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] xsl:analyze-string grouping bug?

I can confirm that this bug is evident in my installation of Marklogic (7.0-3). 
And a quick sanity check of the data + XSLT using Saxon produces the expected 
output values inside <Date>. Can you file a bug report on ML's support site? If 
not and if no one from MarkLogic posts in response, I'll do that later today.

David

On Mon, 8 Sep 2014, "neil bradley" wrote:

> Hi,
>
> I think there is a bug in the MarkLogic implementation of the 
> <xsl:analyze-string> element.
>
> It seems that you cannot have more than one character between regex 
> groups.
>
> My original issue was date ranges, where there could be a “-“ or “- “ 
> between the dates, and wasted some some investigating hyphens in 
> character classes.
>
> But the problem occurs with ANY two characters, even if I know the 
> exact order. So, if the dates were going to be formatted like this:
>
>  <Dates>2004xy2006</Dates>
>
> Then it still fails when I try this:
>
> <xsl:template match="Dates">
>  <xsl:variable name="inputString"><xsl:value-of 
> select="."/></xsl:variable>  <xsl:choose>
>    <xsl:when test="matches($inputString, '(\d\d\d\d)xy(\d\d\d\d)', 
> 'i')">
>      <xsl:analyze-string select="$inputString"
> regex="(\d\d\d\d)xy(\d\d\d\d)" flags="i">
>        <xsl:matching-substring>
>          <Dates>
>            <Date><xsl:value-of select="regex-group(1)"/></Date>
>            <Date><xsl:value-of select="regex-group(2)"/></Date>
>          </Dates>
>        </xsl:matching-substring>
>      </xsl:analyze-string>
>    </xsl:when>
>  </xsl:choose>
> </xsl:template>
>
> The matches() function works fine, but the second <Date> element in 
> the output is empty. So the  regex attribute is not working correctly.
>
> The problem cannot be avoided by using a character class [...] *,  or 
> even by adding + or* to a single character!
>
> Incidentally, it does recover thereafter. So, if there is a third 
> group, then that will be output correctly.
>
> Of course, the solution is to just make the intermediate characters 
> into a group too, even though I don’t want that group. But I still 
> think this is a bug that is worth noting.
>
> Neil.
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

--
David Sewell, Editorial and Technical Manager ROTUNDA, The University of 
Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to