5.0-3 (RHEL)
On Tue, 8 May 2012, Colleen Whitney wrote:
David, what version are you running?
________________________________________
From: [email protected]
[[email protected]] On Behalf Of David Sewell
[[email protected]]
Sent: Tuesday, May 08, 2012 7:24 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Anyone noticing buggy snippeting behavior
with search:search()?
All,
I'm probably going to submit a formal bug report on this class of problem, but
I'm just wondering whether other users out there have noticed similar phenomena.
The default snippeting behavior in search:search() tends to do the wrong thing
in cases where the matched text is in an element (say a <p> or <para>) that has
mixed content. For example, consider this paragraph from our data:
<p>Charles Yancey (1766–ca. 1825) was a magistrate of <rs>Albemarle County</rs>
from 1796, colonel in the local militia, 1806–15, and sheriff, 1821–23. He
represented the county in the <name>Virginia House of Delegates</name>, 1814–17.
Yancey also operated a tavern, store, mill, and distillery. He corresponded
regularly with TJ on subjects ranging from procurement of clover seed and
millstones to matters under consideration by the <name>General Assembly</name>,
including the incorporation of <name>Central College</name> [... etc.]</p>
Running search:search() with a simple query on
"Central College"
as a phrase produces the snippet result (omitting @path):
<search:match>Charles Yancey (1766–ca. 1825) was a magistrate of <search:highlight>Central
College</search:highlight> </search:match>
Note that "was a magistrate of Central College" misrepresents the text. There
should be an ellipsis after "magistrate of".
Removing the <rs> tag from "Albemarle County" in the source eliminates the buggy
output, so there's definitely an interaction with embedded elements going on.
I'm just wondering if others have noticed similar behavior with their content.
David S.
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general