Hello,
I'm developing a web application in which I want to provide a browse-by-title
feature with an alpha wheel -- by which I mean a row of links labeled A, B, C,
etc. that allow the user to click on a letter and get back all titles starting
with that letter. Under the hood I need to implement this feature as a search.
That is, I don't want to retrieve all titles and then filter out the titles
starting with a particular letter. That won't scale, and it just seems so
inelegant and overwrought. It also won't work with my pagination code, where I
provide links allowing the user to page through the results in pages of
$page-length, because that code relies on the <search:response> document that
search:search() returns (including relying on @total).
Also, ideally the solution to this problem should include Unicode
normalization, specifically decomposition. Currently we're building a pilot
project, but we expect to have tens of thousands of documents eventually, some
of which might not be in English. I need the search results to include
documents where the first letter of the title starts with a non-ASCII
character, such as a letter with a diacritical mark. Simply put, when the user
clicks the "E" link I need to retrieve titles starting with "E" but also ones
starting with "É" etc.
In the database config, I've got an element range index on the relevant
element, which in our documents is <sortTitle>, containing the title with
initial articles like a/an/the stripped off.
My first thought was to modify the XML documents themselves to include an
attribute containing the (Unicode normalized) first letter of <sortTitle>. I
assume that would allow me to set up an attribute index and base my searches on
that, as in search:search("first-letter:A", ...). But I consider that a last
resort; I'd much prefer to handle this within the application rather than
updating the documents.
I thought that using the * wildcard might work, but I haven't been able to hit
upon the right mix of index(es), word lexicon(s), and database config settings
to make that idea work.
Thanks in advance for any advice!
Greg
Gregory Murray
Digital Library Application Developer
Princeton Theological Seminary Library
[email protected]
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general