Colleen,

You were right. Using the default collation didn't help. The real problem was 
that some of our documents had empty <name/> elements. If <name> is present but 
empty, the empty ones appear first in the sorted list. If <name> is absent, 
they appear last as desired.

Personally I think that ideally the Search API would sort the two situations 
identically, but at least now I know there's a rationale to it. I suppose it's 
analogous to a relational database treating an empty string value differently 
than NULL.

Thanks,
Greg


On Feb 21, 2012, at 11:57 AM, Colleen Whitney wrote:

> I don't think the fact that it's non-default is the issue. My off the cuff 
> guess is that it has to do with whitespace significance with this particular 
> collation. You could do some small scale experiments with collations to see 
> if you can arrive at one that yields the desired result. I'll file an RFE on 
> controlling empty in sort order, meanwhile. 
> 
> Sent from my iPhone
> 
> On Feb 21, 2012, at 8:19 AM, "Murray, Gregory" <[email protected]> 
> wrote:
> 
>> Colleen,
>> 
>> Thanks for the info. It looks like the problem is indeed using a non-default 
>> collation. Here's a (real but greatly simplified for illustration) example 
>> document:
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <doc xmlns="http://digital.library.ptsem.edu/ia";>
>> <metadata>
>>   <id>apostlescreedlor00geeg</id>
>>   <name>Geegby, W. B.</name>
>>   <title>The Apostles' Creed and the Lord's prayer in the Kru dialect</title>
>>   <date>1840</date>
>>   <language code="eng">English</language>
>>   <class>Old Testament</class>
>> </metadata>
>> </doc>
>> 
>> I have range element indexes on date (gYear), language (string with default 
>> collation), and name (string with collation 
>> http://marklogic.com/collation//AS/T0020).
>> 
>> In the <options> to search:search(), my (again, real but simplified for 
>> illustration) sort operator looks like this:
>> 
>>     <operator name="sort">
>>       <state name="name">
>>         <sort-order type="xs:string" 
>> collation="http://marklogic.com/collation//AS/T0020";>
>>           <element ns="http://digital.library.ptsem.edu/ia"; name="name"/>
>>         </sort-order>
>>       </state>
>>       <state name="date">
>>         <sort-order type="xs:gYear">
>>           <element ns="http://digital.library.ptsem.edu/ia"; name="date"/>
>>         </sort-order>
>>       </state>
>>       <state name="language">
>>         <sort-order type="xs:string">
>>           <element ns="http://digital.library.ptsem.edu/ia"; name="language"/>
>>         </sort-order>
>>       </state>
>>     </operator>
>> 
>> If I sort by date (if the qtext passed to search:search() includes 
>> "sort:date") then I get empty last (documents with no <date> element occur 
>> last in the search results), as I would expect and prefer. Similarly, if I 
>> sort by language I get empty last. But if I sort by name, I get empty first.
>> 
>> Any way around this other than switching to the default collation for <name>?
>> 
>> Many thanks,
>> Greg
>> 
>> 
>> On Feb 20, 2012, at 7:47 PM, Colleen Whitney wrote:
>> 
>>> Hi Greg,
>>> 
>>> The Search API doesn't have support as yet for specifying "empty least" or 
>>> "empty greatest" on sorting.  
>>> 
>>> You can specify a *direction* as an attribute on the <sort-order> element 
>>> (direction="ascending"  or direction="descending").  When descending, the 
>>> server defaults to empty least, and when ascending it defaults to empty 
>>> greatest, so I think it *should* actually default to them coming out last.  
>>> But collation is important here, and it's possible that the collation 
>>> you're using here could be involved.  If you have a very small set of test 
>>> "name" elements you can share, along with how they're sorting, it might be 
>>> helpful in understanding what you're seeing.
>>> 
>>> --Colleen
>>> 
>>> ________________________________________
>>> From: [email protected] 
>>> [[email protected]] On Behalf Of Murray, Gregory 
>>> [[email protected]]
>>> Sent: Monday, February 20, 2012 10:37 AM
>>> To: MarkLogic Developer Discussion
>>> Subject: [MarkLogic Dev General] Search API results: sorting empty last
>>> 
>>> When using the Search API and using a sort operator such as this:
>>> 
>>>    <search:operator name="sort">
>>>      <search:state name="name">
>>>        <search:sort-order type="xs:string" 
>>> collation="http://marklogic.com/collation//AS/T0020";>
>>>          <search:element ns="http://example.com/ns"; name="name"/>
>>>        </search:sort-order>
>>>      </search:state>
>>>      <!-- ... -->
>>>    </search:operator>
>>> 
>>> is there a way to specify that documents with a missing or empty <name> 
>>> element should occur *last* in the sorted search results? It appears that 
>>> empty values occur at the top of the search results by default.
>>> 
>>> Thanks,
>>> Greg
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to