All,

I'm probably going to submit a formal bug report on this class of problem, but I'm just wondering whether other users out there have noticed similar phenomena. The default snippeting behavior in search:search() tends to do the wrong thing in cases where the matched text is in an element (say a <p> or <para>) that has mixed content. For example, consider this paragraph from our data:

<p>Charles Yancey (1766–ca. 1825) was a magistrate of <rs>Albemarle County</rs> from 1796, colonel in the local militia, 1806–15, and sheriff, 1821–23. He represented the county in the <name>Virginia House of Delegates</name>, 1814–17. Yancey also operated a tavern, store, mill, and distillery. He corresponded regularly with TJ on subjects ranging from procurement of clover seed and millstones to matters under consideration by the <name>General Assembly</name>, including the incorporation of <name>Central College</name> [... etc.]</p>

Running search:search() with a simple query on

        "Central College"

as a phrase produces the snippet result (omitting @path):

<search:match>Charles Yancey (1766–ca. 1825) was a magistrate of <search:highlight>Central 
College</search:highlight> </search:match>

Note that "was a magistrate of Central College" misrepresents the text. There should be an ellipsis after "magistrate of".

Removing the <rs> tag from "Albemarle County" in the source eliminates the buggy output, so there's definitely an interaction with embedded elements going on. I'm just wondering if others have noticed similar behavior with their content.

David S.

--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to