Re: [MarkLogic Dev General] Searching for first letter of an element

Murray, Gregory Thu, 03 Mar 2011 13:24:58 -0800

On Mar 3, 2011, at 3:42 PM, Kelly Stirman wrote:

> Hi Greg,
> 
> Sounds like you are using the SearchAPI?
> 
> If so, you can configure your options node to only return results for facets, 
> and not the documents themselves. This would give you a list of titles.
>


Kelly,

Yes, I'm using the Search API. For me, the problem with facets is that I need 
the actual search results for display purposes. That is, I need to display more 
than just the title; instead I use the actual <search:result> elements and 
their @index and @uri attributes.

> I think you can also configure your sortTitle facet to only return values for 
> a specific bucket, which in your case would be the letter the user selected. 
> You would have to construct your options node dynamically, but that is easy 
> enough.
> 
> As for the diacritics, have you looked at the collation builder to create a 
> collation that will order your values per your requirements?
> 

I've looked at it briefly. If the diacritic-insensitive option does what I 
think it does (namely, to consider a letter with diacritics as equal to its 
analogous or "decomposed" ASCII letter), then that option should work.

Many thanks for the suggestions!
Greg

> Kelly
> 
> On 3 Mar 2011, at 06:55 , Murray, Gregory wrote:
> 
>> Hello,
>> 
>> I'm developing a web application in which I want to provide a 
>> browse-by-title feature with an alpha wheel -- by which I mean a row of 
>> links labeled A, B, C, etc. that allow the user to click on a letter and get 
>> back all titles starting with that letter. Under the hood I need to 
>> implement this feature as a search. That is, I don't want to retrieve all 
>> titles and then filter out the titles starting with a particular letter. 
>> That won't scale, and it just seems so inelegant and overwrought. It also 
>> won't work with my pagination code, where I provide links allowing the user 
>> to page through the results in pages of $page-length, because that code 
>> relies on the <search:response> document that search:search() returns 
>> (including relying on @total).
>> 
>> Also, ideally the solution to this problem should include Unicode 
>> normalization, specifically decomposition. Currently we're building a pilot 
>> project, but we expect to have tens of thousands of documents eventually, 
>> some of which might not be in English. I need the search results to include 
>> documents where the first letter of the title starts with a non-ASCII 
>> character, such as a letter with a diacritical mark. Simply put, when the 
>> user clicks the "E" link I need to retrieve titles starting with "E" but 
>> also ones starting with "?" etc.
>> 
>> In the database config, I've got an element range index on the relevant 
>> element, which in our documents is <sortTitle>, containing the title with 
>> initial articles like a/an/the stripped off.
>> 
>> My first thought was to modify the XML documents themselves to include an 
>> attribute containing the (Unicode normalized) first letter of <sortTitle>. I 
>> assume that would allow me to set up an attribute index and base my searches 
>> on that, as in search:search("first-letter:A", ...). But I consider that a 
>> last resort; I'd much prefer to handle this within the application rather 
>> than updating the documents.
>> 
>> I thought that using the * wildcard might work, but I haven't been able to 
>> hit upon the right mix of index(es), word lexicon(s), and database config 
>> settings to make that idea work.
>> 
>> Thanks in advance for any advice!
>> Greg
>> 
>> Gregory Murray
>> Digital Library Application Developer
>> Princeton Theological Seminary Library
>> [email protected]
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Searching for first letter of an element

Reply via email to