I can confirm that this bug is evident in my installation of Marklogic (7.0-3). And a quick sanity check of the data + XSLT using Saxon produces the expected output values inside <Date>. Can you file a bug report on ML's support site? If not and if no one from MarkLogic posts in response, I'll do that later today.

David

On Mon, 8 Sep 2014, "neil bradley" wrote:

Hi,

I think there is a bug in the MarkLogic implementation of the
<xsl:analyze-string> element.

It seems that you cannot have more than one character between regex
groups.

My original issue was date ranges, where there could be a “-“ or “- “
between the dates, and wasted some some investigating hyphens in
character classes.

But the problem occurs with ANY two characters, even if I know the
exact order. So, if the dates were going to be formatted like this:

 <Dates>2004xy2006</Dates>

Then it still fails when I try this:

<xsl:template match="Dates">
 <xsl:variable name="inputString"><xsl:value-of
select="."/></xsl:variable>
 <xsl:choose>
   <xsl:when test="matches($inputString, '(\d\d\d\d)xy(\d\d\d\d)',
'i')">
     <xsl:analyze-string select="$inputString"
regex="(\d\d\d\d)xy(\d\d\d\d)" flags="i">
       <xsl:matching-substring>
         <Dates>
           <Date><xsl:value-of select="regex-group(1)"/></Date>
           <Date><xsl:value-of select="regex-group(2)"/></Date>
         </Dates>
       </xsl:matching-substring>
     </xsl:analyze-string>
   </xsl:when>
 </xsl:choose>
</xsl:template>

The matches() function works fine, but the second <Date> element in
the output is empty. So the  regex attribute is not working correctly.

The problem cannot be avoided by using a character class [...] *,  or
even by adding + or* to a single character!

Incidentally, it does recover thereafter. So, if there is a third
group, then that will be output correctly.

Of course, the solution is to just make the intermediate characters
into a group too, even though I don’t want that group. But I still
think this is a bug that is worth noting.

Neil.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to