Thanks – these are good ideas and make sense, but as I dig into the data a
little deeper I see something odd that doesn’t seem be working the way I would
expect it.
Assume I inspected a document via:
doc("/data-sources/lawcom-contrib/sites/almstaff/2017/03/21/no-womans-land-cybersecurity-industry-suffers-from-gender-imbalance-discrimination.xml")
In that I can see 1 single HTML node starting with
<HTML xmlns:i="incisive-repository">
... bunch of <p> child nodes and then </HTML>
Then directly followed by
<document xml:lang="en"
xmlns:occurrenceattr="http://luxid.temis.com/occurrence/attribute"
xmlns:entityattr="http://luxid.temis.com/entity/attribute"
xmlns:entity="http://luxid.temis.com/entity"
xmlns:category="http://luxid.temis.com/category" xmlns="">
... bunch of <entity> nodes and the same <p> nodes n the HTML set.
So in my view, there’s only 1 HTML node in the doc.
But when I do a directory query to return docs and write the value for
$doc//ir:HTML
I get first
<i:HTML xmlns:i="incisive-repository">
.. bunch of <p> child nodes and ending with </i:HTML>
Then
<HTML xmlns="incisive-repository" xmlns:i="incisive-repository">
.. image
+ <i xmlns="http://www.w3.org/1999/xhtml">
... entity nodes and a duplicate of the <p> children in the first.
How come there's only 1 HTML node in the doc when inspecting it but when I do a
directory query and write the HTML value with descendants, I get more than 1?
Does the XSLT notation and unwrap suggestion still make sense given that
context?
From: <[email protected]> on behalf of Geert Josten
<[email protected]>
Reply-To: MarkLogic <[email protected]>
Date: Monday, March 20, 2017 at 8:49 AM
To: MarkLogic <[email protected]>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery
You may want to unwrap entity:entity and suppress entity:entityattr instead,
but otherwise this should work just fine all the way down to at least MarkLogic
5.. :)
Cheers
From:
<[email protected]<mailto:[email protected]>>
on behalf of Christopher Hamlin <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion
<[email protected]<mailto:[email protected]>>
Date: Monday, March 20, 2017 at 4:29 PM
To: MarkLogic Developer Discussion
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery
I don't know off-hand of changes in xslt between 7 and 8.
Something like this in 8 is what I was thinking, don't know if it is really
what you need:
let $doc := (: blah blah blah :)
let $xslt :=
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ir="incisive-repository">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ir:HTML[preceding-sibling::ir:HTML]"></xsl:template>
</xsl:stylesheet>
return xdmp:xslt-eval ($xslt, $doc)
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general