Hello,

I'm developing a web application in which I want to provide a browse-by-title 
feature with an alpha wheel -- by which I mean a row of links labeled A, B, C, 
etc. that allow the user to click on a letter and get back all titles starting 
with that letter. Under the hood I need to implement this feature as a search. 
That is, I don't want to retrieve all titles and then filter out the titles 
starting with a particular letter. That won't scale, and it just seems so 
inelegant and overwrought. It also won't work with my pagination code, where I 
provide links allowing the user to page through the results in pages of 
$page-length, because that code relies on the <search:response> document that 
search:search() returns (including relying on @total).

Also, ideally the solution to this problem should include Unicode 
normalization, specifically decomposition. Currently we're building a pilot 
project, but we expect to have tens of thousands of documents eventually, some 
of which might not be in English. I need the search results to include 
documents where the first letter of the title starts with a non-ASCII 
character, such as a letter with a diacritical mark. Simply put, when the user 
clicks the "E" link I need to retrieve titles starting with "E" but also ones 
starting with "É" etc.

In the database config, I've got an element range index on the relevant 
element, which in our documents is <sortTitle>, containing the title with 
initial articles like a/an/the stripped off.

My first thought was to modify the XML documents themselves to include an 
attribute containing the (Unicode normalized) first letter of <sortTitle>. I 
assume that would allow me to set up an attribute index and base my searches on 
that, as in search:search("first-letter:A", ...). But I consider that a last 
resort; I'd much prefer to handle this within the application rather than 
updating the documents.

I thought that using the * wildcard might work, but I haven't been able to hit 
upon the right mix of index(es), word lexicon(s), and database config settings 
to make that idea work.

Thanks in advance for any advice!
Greg

Gregory Murray
Digital Library Application Developer
Princeton Theological Seminary Library
[email protected]

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to