Re: [MarkLogic Dev General] Searching for first letter of an element

Michael Sokolov Thu, 03 Mar 2011 19:25:30 -0800

We find users generally prefer a "jump-to" feature rather than a 
filter.  Ie - rather than showing all results starting with letter "X" 
(there might not be any), show all results sorted alphabetically, 
beginning at the first result sorting >= "X". This is just a mechanism 
for positioning by value within a large result list, rather than by page 
number.  To accomplish this we use a range index on a processed sortkey 
of some sort (the range index can handle diacritic normalization).  The 
cts:element-range-query() function is at the heart of this; it's nice 
because it can be combined with other query parameters.


I'm not clear whether this approach would fit well with the 
search:search api, though.

Cheers

-Mike

PS - some difficulties arise if you want to be able to page backwards 
from the starting point - show the last 20 entries at the end of the W's 
in the example above - but it's all do-able.

On 3/3/2011 9:55 AM, Murray, Gregory wrote:
> Hello,
>
> I'm developing a web application in which I want to provide a browse-by-title 
> feature with an alpha wheel -- by which I mean a row of links labeled A, B, 
> C, etc. that allow the user to click on a letter and get back all titles 
> starting with that letter. Under the hood I need to implement this feature as 
> a search. That is, I don't want to retrieve all titles and then filter out 
> the titles starting with a particular letter. That won't scale, and it just 
> seems so inelegant and overwrought. It also won't work with my pagination 
> code, where I provide links allowing the user to page through the results in 
> pages of $page-length, because that code relies on the<search:response>  
> document that search:search() returns (including relying on @total).
>
> Also, ideally the solution to this problem should include Unicode 
> normalization, specifically decomposition. Currently we're building a pilot 
> project, but we expect to have tens of thousands of documents eventually, 
> some of which might not be in English. I need the search results to include 
> documents where the first letter of the title starts with a non-ASCII 
> character, such as a letter with a diacritical mark. Simply put, when the 
> user clicks the "E" link I need to retrieve titles starting with "E" but also 
> ones starting with "É" etc.
>
> In the database config, I've got an element range index on the relevant 
> element, which in our documents is<sortTitle>, containing the title with 
> initial articles like a/an/the stripped off.
>
> My first thought was to modify the XML documents themselves to include an 
> attribute containing the (Unicode normalized) first letter of<sortTitle>. I 
> assume that would allow me to set up an attribute index and base my searches 
> on that, as in search:search("first-letter:A", ...). But I consider that a 
> last resort; I'd much prefer to handle this within the application rather 
> than updating the documents.
>
> I thought that using the * wildcard might work, but I haven't been able to 
> hit upon the right mix of index(es), word lexicon(s), and database config 
> settings to make that idea work.
>
> Thanks in advance for any advice!
> Greg
>
> Gregory Murray
> Digital Library Application Developer
> Princeton Theological Seminary Library
> [email protected]
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Searching for first letter of an element

Reply via email to