Thanks – these are good ideas and make sense, but as I dig into the data a 
little deeper I see something odd that doesn’t seem be working the way I would 
expect it.

Assume I inspected a document via:
doc("/data-sources/lawcom-contrib/sites/almstaff/2017/03/21/no-womans-land-cybersecurity-industry-suffers-from-gender-imbalance-discrimination.xml")

In that I can see 1 single HTML node starting with
<HTML xmlns:i="incisive-repository">
... bunch of <p> child nodes and then </HTML>

Then directly followed by
<document xml:lang="en" 
xmlns:occurrenceattr="http://luxid.temis.com/occurrence/attribute"; 
xmlns:entityattr="http://luxid.temis.com/entity/attribute"; 
xmlns:entity="http://luxid.temis.com/entity"; 
xmlns:category="http://luxid.temis.com/category"; xmlns="">
... bunch of <entity> nodes and the same <p> nodes n the HTML set.

So in my view, there’s only 1 HTML node in the doc.

But when I do a directory query to return docs and write the value for 
$doc//ir:HTML

I get first
<i:HTML xmlns:i="incisive-repository">
.. bunch of <p> child nodes and ending with </i:HTML>

Then
<HTML xmlns="incisive-repository" xmlns:i="incisive-repository">
.. image
+ <i xmlns="http://www.w3.org/1999/xhtml";>
... entity nodes and a duplicate of the <p> children in the first.

How come there's only 1 HTML node in the doc when inspecting it but when I do a 
directory query and write the HTML value with descendants, I get more than 1?

Does the XSLT notation and unwrap suggestion still make sense given that 
context?


From: <[email protected]> on behalf of Geert Josten 
<[email protected]>
Reply-To: MarkLogic <[email protected]>
Date: Monday, March 20, 2017 at 8:49 AM
To: MarkLogic <[email protected]>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

You may want to unwrap entity:entity and suppress entity:entityattr instead, 
but otherwise this should work just fine all the way down to at least MarkLogic 
5.. :)

Cheers

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Christopher Hamlin <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Monday, March 20, 2017 at 4:29 PM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

I don't know off-hand of changes in xslt between 7 and 8.

Something like this in 8 is what I was thinking, don't know if it is really 
what you need:

let $doc := (: blah blah blah :)
let $xslt :=
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
xmlns:ir="incisive-repository">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="ir:HTML[preceding-sibling::ir:HTML]"></xsl:template>
</xsl:stylesheet>
return xdmp:xslt-eval ($xslt, $doc)
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to