This may not be the best solution, but one that worked for our system was to create another element whose value is either true/false on/off, and then sort primarily on that binary element and then secondarily on the element with the 'real' values. We had a similar issue with articles whose publication year is either a four digit year or 'Undefined'. We were able to implement the new element as a part of yearly reload, but if you don't have that luxury one could use corb and run at a low thread count.
Matthew ______________________________________ Matthew Treskon Digital Services Librarian--DigiTop National Agricultural Library US Department of Agriculture, ARS [email protected] 765-494-8692 (Phone) 765-494-1705 (Fax) ...advancing access to global information for agriculture. ______________________________________ -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Murray, Gregory Sent: Tuesday, February 21, 2012 11:20 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Search API results: sorting empty last Colleen, Thanks for the info. It looks like the problem is indeed using a non-default collation. Here's a (real but greatly simplified for illustration) example document: <?xml version="1.0" encoding="UTF-8"?> <doc xmlns="http://digital.library.ptsem.edu/ia"> <metadata> <id>apostlescreedlor00geeg</id> <name>Geegby, W. B.</name> <title>The Apostles' Creed and the Lord's prayer in the Kru dialect</title> <date>1840</date> <language code="eng">English</language> <class>Old Testament</class> </metadata> </doc> I have range element indexes on date (gYear), language (string with default collation), and name (string with collation http://marklogic.com/collation//AS/T0020). In the <options> to search:search(), my (again, real but simplified for illustration) sort operator looks like this: <operator name="sort"> <state name="name"> <sort-order type="xs:string" collation="http://marklogic.com/collation//AS/T0020"> <element ns="http://digital.library.ptsem.edu/ia" name="name"/> </sort-order> </state> <state name="date"> <sort-order type="xs:gYear"> <element ns="http://digital.library.ptsem.edu/ia" name="date"/> </sort-order> </state> <state name="language"> <sort-order type="xs:string"> <element ns="http://digital.library.ptsem.edu/ia" name="language"/> </sort-order> </state> </operator> If I sort by date (if the qtext passed to search:search() includes "sort:date") then I get empty last (documents with no <date> element occur last in the search results), as I would expect and prefer. Similarly, if I sort by language I get empty last. But if I sort by name, I get empty first. Any way around this other than switching to the default collation for <name>? Many thanks, Greg On Feb 20, 2012, at 7:47 PM, Colleen Whitney wrote: > Hi Greg, > > The Search API doesn't have support as yet for specifying "empty least" or > "empty greatest" on sorting. > > You can specify a *direction* as an attribute on the <sort-order> element > (direction="ascending" or direction="descending"). When descending, the > server defaults to empty least, and when ascending it defaults to empty > greatest, so I think it *should* actually default to them coming out last. > But collation is important here, and it's possible that the collation you're > using here could be involved. If you have a very small set of test "name" > elements you can share, along with how they're sorting, it might be helpful > in understanding what you're seeing. > > --Colleen > > ________________________________________ > From: [email protected] > [[email protected]] On Behalf Of Murray, Gregory > [[email protected]] > Sent: Monday, February 20, 2012 10:37 AM > To: MarkLogic Developer Discussion > Subject: [MarkLogic Dev General] Search API results: sorting empty last > > When using the Search API and using a sort operator such as this: > > <search:operator name="sort"> > <search:state name="name"> > <search:sort-order type="xs:string" > collation="http://marklogic.com/collation//AS/T0020"> > <search:element ns="http://example.com/ns" name="name"/> > </search:sort-order> > </search:state> > <!-- ... --> > </search:operator> > > is there a way to specify that documents with a missing or empty <name> > element should occur *last* in the sorted search results? It appears that > empty values occur at the top of the search results by default. > > Thanks, > Greg > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
