This may not be the best solution, but one that worked for our system was to 
create another element whose value is either true/false on/off, and then sort 
primarily on that binary element and then secondarily on the element with the 
'real' values. We had a similar issue with articles whose publication year is 
either a four digit year or 'Undefined'. We were able to implement the new 
element as a part of yearly reload, but if you don't have that luxury one could 
use corb and run at a low thread count.


Matthew

______________________________________

Matthew Treskon
Digital Services Librarian--DigiTop
National Agricultural Library
US Department of Agriculture, ARS

[email protected]
765-494-8692 (Phone)
765-494-1705 (Fax)

...advancing access to global
information for agriculture.
______________________________________



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Murray, Gregory
Sent: Tuesday, February 21, 2012 11:20 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Search API results: sorting empty last

Colleen,

Thanks for the info. It looks like the problem is indeed using a non-default 
collation. Here's a (real but greatly simplified for illustration) example 
document:

<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns="http://digital.library.ptsem.edu/ia";>
  <metadata>
    <id>apostlescreedlor00geeg</id>
    <name>Geegby, W. B.</name>
    <title>The Apostles' Creed and the Lord's prayer in the Kru dialect</title>
    <date>1840</date>
    <language code="eng">English</language>
    <class>Old Testament</class>
  </metadata>
</doc>

I have range element indexes on date (gYear), language (string with default 
collation), and name (string with collation 
http://marklogic.com/collation//AS/T0020).

In the <options> to search:search(), my (again, real but simplified for 
illustration) sort operator looks like this:

      <operator name="sort">
        <state name="name">
          <sort-order type="xs:string" 
collation="http://marklogic.com/collation//AS/T0020";>
            <element ns="http://digital.library.ptsem.edu/ia"; name="name"/>
          </sort-order>
        </state>
        <state name="date">
          <sort-order type="xs:gYear">
            <element ns="http://digital.library.ptsem.edu/ia"; name="date"/>
          </sort-order>
        </state>
        <state name="language">
          <sort-order type="xs:string">
            <element ns="http://digital.library.ptsem.edu/ia"; name="language"/>
          </sort-order>
        </state>
      </operator>

If I sort by date (if the qtext passed to search:search() includes "sort:date") 
then I get empty last (documents with no <date> element occur last in the 
search results), as I would expect and prefer. Similarly, if I sort by language 
I get empty last. But if I sort by name, I get empty first.

Any way around this other than switching to the default collation for <name>?

Many thanks,
Greg


On Feb 20, 2012, at 7:47 PM, Colleen Whitney wrote:

> Hi Greg,
>
> The Search API doesn't have support as yet for specifying "empty least" or 
> "empty greatest" on sorting.
>
> You can specify a *direction* as an attribute on the <sort-order> element 
> (direction="ascending"  or direction="descending").  When descending, the 
> server defaults to empty least, and when ascending it defaults to empty 
> greatest, so I think it *should* actually default to them coming out last.  
> But collation is important here, and it's possible that the collation you're 
> using here could be involved.  If you have a very small set of test "name" 
> elements you can share, along with how they're sorting, it might be helpful 
> in understanding what you're seeing.
>
> --Colleen
>
> ________________________________________
> From: [email protected] 
> [[email protected]] On Behalf Of Murray, Gregory 
> [[email protected]]
> Sent: Monday, February 20, 2012 10:37 AM
> To: MarkLogic Developer Discussion
> Subject: [MarkLogic Dev General] Search API results: sorting empty last
>
> When using the Search API and using a sort operator such as this:
>
>      <search:operator name="sort">
>        <search:state name="name">
>          <search:sort-order type="xs:string" 
> collation="http://marklogic.com/collation//AS/T0020";>
>            <search:element ns="http://example.com/ns"; name="name"/>
>          </search:sort-order>
>        </search:state>
>        <!-- ... -->
>      </search:operator>
>
> is there a way to specify that documents with a missing or empty <name> 
> element should occur *last* in the sorted search results? It appears that 
> empty values occur at the top of the search results by default.
>
> Thanks,
> Greg
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to