Colleen, You were right. Using the default collation didn't help. The real problem was that some of our documents had empty <name/> elements. If <name> is present but empty, the empty ones appear first in the sorted list. If <name> is absent, they appear last as desired.
Personally I think that ideally the Search API would sort the two situations identically, but at least now I know there's a rationale to it. I suppose it's analogous to a relational database treating an empty string value differently than NULL. Thanks, Greg On Feb 21, 2012, at 11:57 AM, Colleen Whitney wrote: > I don't think the fact that it's non-default is the issue. My off the cuff > guess is that it has to do with whitespace significance with this particular > collation. You could do some small scale experiments with collations to see > if you can arrive at one that yields the desired result. I'll file an RFE on > controlling empty in sort order, meanwhile. > > Sent from my iPhone > > On Feb 21, 2012, at 8:19 AM, "Murray, Gregory" <[email protected]> > wrote: > >> Colleen, >> >> Thanks for the info. It looks like the problem is indeed using a non-default >> collation. Here's a (real but greatly simplified for illustration) example >> document: >> >> <?xml version="1.0" encoding="UTF-8"?> >> <doc xmlns="http://digital.library.ptsem.edu/ia"> >> <metadata> >> <id>apostlescreedlor00geeg</id> >> <name>Geegby, W. B.</name> >> <title>The Apostles' Creed and the Lord's prayer in the Kru dialect</title> >> <date>1840</date> >> <language code="eng">English</language> >> <class>Old Testament</class> >> </metadata> >> </doc> >> >> I have range element indexes on date (gYear), language (string with default >> collation), and name (string with collation >> http://marklogic.com/collation//AS/T0020). >> >> In the <options> to search:search(), my (again, real but simplified for >> illustration) sort operator looks like this: >> >> <operator name="sort"> >> <state name="name"> >> <sort-order type="xs:string" >> collation="http://marklogic.com/collation//AS/T0020"> >> <element ns="http://digital.library.ptsem.edu/ia" name="name"/> >> </sort-order> >> </state> >> <state name="date"> >> <sort-order type="xs:gYear"> >> <element ns="http://digital.library.ptsem.edu/ia" name="date"/> >> </sort-order> >> </state> >> <state name="language"> >> <sort-order type="xs:string"> >> <element ns="http://digital.library.ptsem.edu/ia" name="language"/> >> </sort-order> >> </state> >> </operator> >> >> If I sort by date (if the qtext passed to search:search() includes >> "sort:date") then I get empty last (documents with no <date> element occur >> last in the search results), as I would expect and prefer. Similarly, if I >> sort by language I get empty last. But if I sort by name, I get empty first. >> >> Any way around this other than switching to the default collation for <name>? >> >> Many thanks, >> Greg >> >> >> On Feb 20, 2012, at 7:47 PM, Colleen Whitney wrote: >> >>> Hi Greg, >>> >>> The Search API doesn't have support as yet for specifying "empty least" or >>> "empty greatest" on sorting. >>> >>> You can specify a *direction* as an attribute on the <sort-order> element >>> (direction="ascending" or direction="descending"). When descending, the >>> server defaults to empty least, and when ascending it defaults to empty >>> greatest, so I think it *should* actually default to them coming out last. >>> But collation is important here, and it's possible that the collation >>> you're using here could be involved. If you have a very small set of test >>> "name" elements you can share, along with how they're sorting, it might be >>> helpful in understanding what you're seeing. >>> >>> --Colleen >>> >>> ________________________________________ >>> From: [email protected] >>> [[email protected]] On Behalf Of Murray, Gregory >>> [[email protected]] >>> Sent: Monday, February 20, 2012 10:37 AM >>> To: MarkLogic Developer Discussion >>> Subject: [MarkLogic Dev General] Search API results: sorting empty last >>> >>> When using the Search API and using a sort operator such as this: >>> >>> <search:operator name="sort"> >>> <search:state name="name"> >>> <search:sort-order type="xs:string" >>> collation="http://marklogic.com/collation//AS/T0020"> >>> <search:element ns="http://example.com/ns" name="name"/> >>> </search:sort-order> >>> </search:state> >>> <!-- ... --> >>> </search:operator> >>> >>> is there a way to specify that documents with a missing or empty <name> >>> element should occur *last* in the sorted search results? It appears that >>> empty values occur at the top of the search results by default. >>> >>> Thanks, >>> Greg >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
