Re: [MarkLogic Dev General] Searching for first letter of an element

Murray, Gregory Fri, 04 Mar 2011 06:34:47 -0800

Mike,

All things considered, we're going to take your advice and add an attribute to 
our documents. That's definitely the most straightforward and, as you said, 
probably the most efficient.


Thanks!
Greg

On Mar 3, 2011, at 11:43 AM, Michael Blakeley wrote:

> You *could* do this with cts:element-value-match 
> (http://developer.marklogic.com/pubs/4.2/apidocs/Lexicons.html#cts:element-value-match)
>  or its element-value cousin, but I would start with that "last resort" 
> instead. Adding a new element (or attribute) strikes me as the most efficient 
> solution. With a relational table you would probably add a column. With XML 
> the analogous action is to add an element.
> 
> If you already have a fair amount of XML ingested, you could use 
> http://marklogic.github.com/corb/ to reprocess it. For ongoing enrichment, 
> CPF might suit your needs: http://developer.marklogic.com/learn/4.2/cpf - and 
> the xdmp:diacritic-less function 
> (http://developer.marklogic.com/pubs/4.2/apidocs/Ext-4.html#xdmp:diacritic-less)
>  might be useful too.
> 
> -- Mike
> 
> On 3 Mar 2011, at 06:55 , Murray, Gregory wrote:
> 
>> Hello,
>> 
>> I'm developing a web application in which I want to provide a 
>> browse-by-title feature with an alpha wheel -- by which I mean a row of 
>> links labeled A, B, C, etc. that allow the user to click on a letter and get 
>> back all titles starting with that letter. Under the hood I need to 
>> implement this feature as a search. That is, I don't want to retrieve all 
>> titles and then filter out the titles starting with a particular letter. 
>> That won't scale, and it just seems so inelegant and overwrought. It also 
>> won't work with my pagination code, where I provide links allowing the user 
>> to page through the results in pages of $page-length, because that code 
>> relies on the <search:response> document that search:search() returns 
>> (including relying on @total).
>> 
>> Also, ideally the solution to this problem should include Unicode 
>> normalization, specifically decomposition. Currently we're building a pilot 
>> project, but we expect to have tens of thousands of documents eventually, 
>> some of which might not be in English. I need the search results to include 
>> documents where the first letter of the title starts with a non-ASCII 
>> character, such as a letter with a diacritical mark. Simply put, when the 
>> user clicks the "E" link I need to retrieve titles starting with "E" but 
>> also ones starting with "É" etc.
>> 
>> In the database config, I've got an element range index on the relevant 
>> element, which in our documents is <sortTitle>, containing the title with 
>> initial articles like a/an/the stripped off.
>> 
>> My first thought was to modify the XML documents themselves to include an 
>> attribute containing the (Unicode normalized) first letter of <sortTitle>. I 
>> assume that would allow me to set up an attribute index and base my searches 
>> on that, as in search:search("first-letter:A", ...). But I consider that a 
>> last resort; I'd much prefer to handle this within the application rather 
>> than updating the documents.
>> 
>> I thought that using the * wildcard might work, but I haven't been able to 
>> hit upon the right mix of index(es), word lexicon(s), and database config 
>> settings to make that idea work.
>> 
>> Thanks in advance for any advice!
>> Greg
>> 
>> Gregory Murray
>> Digital Library Application Developer
>> Princeton Theological Seminary Library
>> [email protected]
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Searching for first letter of an element

Reply via email to