You *could* do this with cts:element-value-match 
(http://developer.marklogic.com/pubs/4.2/apidocs/Lexicons.html#cts:element-value-match)
 or its element-value cousin, but I would start with that "last resort" 
instead. Adding a new element (or attribute) strikes me as the most efficient 
solution. With a relational table you would probably add a column. With XML the 
analogous action is to add an element.

If you already have a fair amount of XML ingested, you could use 
http://marklogic.github.com/corb/ to reprocess it. For ongoing enrichment, CPF 
might suit your needs: http://developer.marklogic.com/learn/4.2/cpf - and the 
xdmp:diacritic-less function 
(http://developer.marklogic.com/pubs/4.2/apidocs/Ext-4.html#xdmp:diacritic-less)
 might be useful too.

-- Mike

On 3 Mar 2011, at 06:55 , Murray, Gregory wrote:

> Hello,
> 
> I'm developing a web application in which I want to provide a browse-by-title 
> feature with an alpha wheel -- by which I mean a row of links labeled A, B, 
> C, etc. that allow the user to click on a letter and get back all titles 
> starting with that letter. Under the hood I need to implement this feature as 
> a search. That is, I don't want to retrieve all titles and then filter out 
> the titles starting with a particular letter. That won't scale, and it just 
> seems so inelegant and overwrought. It also won't work with my pagination 
> code, where I provide links allowing the user to page through the results in 
> pages of $page-length, because that code relies on the <search:response> 
> document that search:search() returns (including relying on @total).
> 
> Also, ideally the solution to this problem should include Unicode 
> normalization, specifically decomposition. Currently we're building a pilot 
> project, but we expect to have tens of thousands of documents eventually, 
> some of which might not be in English. I need the search results to include 
> documents where the first letter of the title starts with a non-ASCII 
> character, such as a letter with a diacritical mark. Simply put, when the 
> user clicks the "E" link I need to retrieve titles starting with "E" but also 
> ones starting with "É" etc.
> 
> In the database config, I've got an element range index on the relevant 
> element, which in our documents is <sortTitle>, containing the title with 
> initial articles like a/an/the stripped off.
> 
> My first thought was to modify the XML documents themselves to include an 
> attribute containing the (Unicode normalized) first letter of <sortTitle>. I 
> assume that would allow me to set up an attribute index and base my searches 
> on that, as in search:search("first-letter:A", ...). But I consider that a 
> last resort; I'd much prefer to handle this within the application rather 
> than updating the documents.
> 
> I thought that using the * wildcard might work, but I haven't been able to 
> hit upon the right mix of index(es), word lexicon(s), and database config 
> settings to make that idea work.
> 
> Thanks in advance for any advice!
> Greg
> 
> Gregory Murray
> Digital Library Application Developer
> Princeton Theological Seminary Library
> [email protected]
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to