Re: [MarkLogic Dev General] Searching for first letter of an element

Murray, Gregory Fri, 04 Mar 2011 06:36:53 -0800

Mike,

Nice idea. Hadn't thought of that. Thanks!


Greg


On Mar 3, 2011, at 10:25 PM, Michael Sokolov wrote:

> We find users generally prefer a "jump-to" feature rather than a filter.  Ie 
> - rather than showing all results starting with letter "X" (there might not 
> be any), show all results sorted alphabetically, beginning at the first 
> result sorting >= "X". This is just a mechanism for positioning by value 
> within a large result list, rather than by page number.  To accomplish this 
> we use a range index on a processed sortkey of some sort (the range index can 
> handle diacritic normalization).  The cts:element-range-query() function is 
> at the heart of this; it's nice because it can be combined with other query 
> parameters.
> 
> I'm not clear whether this approach would fit well with the search:search 
> api, though.
> 
> Cheers
> 
> -Mike
> 
> PS - some difficulties arise if you want to be able to page backwards from 
> the starting point - show the last 20 entries at the end of the W's in the 
> example above - but it's all do-able.
> 
> On 3/3/2011 9:55 AM, Murray, Gregory wrote:
>> Hello,
>> 
>> I'm developing a web application in which I want to provide a 
>> browse-by-title feature with an alpha wheel -- by which I mean a row of 
>> links labeled A, B, C, etc. that allow the user to click on a letter and get 
>> back all titles starting with that letter. Under the hood I need to 
>> implement this feature as a search. That is, I don't want to retrieve all 
>> titles and then filter out the titles starting with a particular letter. 
>> That won't scale, and it just seems so inelegant and overwrought. It also 
>> won't work with my pagination code, where I provide links allowing the user 
>> to page through the results in pages of $page-length, because that code 
>> relies on the<search:response>  document that search:search() returns 
>> (including relying on @total).
>> 
>> Also, ideally the solution to this problem should include Unicode 
>> normalization, specifically decomposition. Currently we're building a pilot 
>> project, but we expect to have tens of thousands of documents eventually, 
>> some of which might not be in English. I need the search results to include 
>> documents where the first letter of the title starts with a non-ASCII 
>> character, such as a letter with a diacritical mark. Simply put, when the 
>> user clicks the "E" link I need to retrieve titles starting with "E" but 
>> also ones starting with "É" etc.
>> 
>> In the database config, I've got an element range index on the relevant 
>> element, which in our documents is<sortTitle>, containing the title with 
>> initial articles like a/an/the stripped off.
>> 
>> My first thought was to modify the XML documents themselves to include an 
>> attribute containing the (Unicode normalized) first letter of<sortTitle>. I 
>> assume that would allow me to set up an attribute index and base my searches 
>> on that, as in search:search("first-letter:A", ...). But I consider that a 
>> last resort; I'd much prefer to handle this within the application rather 
>> than updating the documents.
>> 
>> I thought that using the * wildcard might work, but I haven't been able to 
>> hit upon the right mix of index(es), word lexicon(s), and database config 
>> settings to make that idea work.
>> 
>> Thanks in advance for any advice!
>> Greg
>> 
>> Gregory Murray
>> Digital Library Application Developer
>> Princeton Theological Seminary Library
>> [email protected]
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Searching for first letter of an element

Reply via email to