I think you should file the report. When you do, important to know whether <rs> is configured for phrase-through or phrase-around (I suspect yes).
________________________________________ From: [email protected] [[email protected]] On Behalf Of David Sewell [[email protected]] Sent: Tuesday, May 08, 2012 7:35 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Anyone noticing buggy snippeting behavior with search:search()? 5.0-3 (RHEL) On Tue, 8 May 2012, Colleen Whitney wrote: > David, what version are you running? > ________________________________________ > From: [email protected] > [[email protected]] On Behalf Of David Sewell > [[email protected]] > Sent: Tuesday, May 08, 2012 7:24 AM > To: General Mark Logic Developer Discussion > Subject: [MarkLogic Dev General] Anyone noticing buggy snippeting behavior > with search:search()? > > All, > > I'm probably going to submit a formal bug report on this class of problem, but > I'm just wondering whether other users out there have noticed similar > phenomena. > The default snippeting behavior in search:search() tends to do the wrong thing > in cases where the matched text is in an element (say a <p> or <para>) that > has > mixed content. For example, consider this paragraph from our data: > > <p>Charles Yancey (1766–ca. 1825) was a magistrate of <rs>Albemarle > County</rs> > from 1796, colonel in the local militia, 1806–15, and sheriff, 1821–23. He > represented the county in the <name>Virginia House of Delegates</name>, > 1814–17. > Yancey also operated a tavern, store, mill, and distillery. He corresponded > regularly with TJ on subjects ranging from procurement of clover seed and > millstones to matters under consideration by the <name>General > Assembly</name>, > including the incorporation of <name>Central College</name> [... etc.]</p> > > Running search:search() with a simple query on > > "Central College" > > as a phrase produces the snippet result (omitting @path): > > <search:match>Charles Yancey (1766–ca. 1825) was a magistrate of > <search:highlight>Central College</search:highlight> </search:match> > > Note that "was a magistrate of Central College" misrepresents the text. There > should be an ellipsis after "magistrate of". > > Removing the <rs> tag from "Albemarle County" in the source eliminates the > buggy > output, so there's definitely an interaction with embedded elements going on. > I'm just wondering if others have noticed similar behavior with their content. > > David S. > > -- > David Sewell, Editorial and Technical Manager > ROTUNDA, The University of Virginia Press > PO Box 400314, Charlottesville, VA 22904-4314 USA > Email: [email protected] Tel: +1 434 924 9973 > Web: http://rotunda.upress.virginia.edu/ > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > -- David Sewell, Editorial and Technical Manager ROTUNDA, The University of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA Email: [email protected] Tel: +1 434 924 9973 Web: http://rotunda.upress.virginia.edu/ _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
