You *could* do this with cts:element-value-match (http://developer.marklogic.com/pubs/4.2/apidocs/Lexicons.html#cts:element-value-match) or its element-value cousin, but I would start with that "last resort" instead. Adding a new element (or attribute) strikes me as the most efficient solution. With a relational table you would probably add a column. With XML the analogous action is to add an element.
If you already have a fair amount of XML ingested, you could use http://marklogic.github.com/corb/ to reprocess it. For ongoing enrichment, CPF might suit your needs: http://developer.marklogic.com/learn/4.2/cpf - and the xdmp:diacritic-less function (http://developer.marklogic.com/pubs/4.2/apidocs/Ext-4.html#xdmp:diacritic-less) might be useful too. -- Mike On 3 Mar 2011, at 06:55 , Murray, Gregory wrote: > Hello, > > I'm developing a web application in which I want to provide a browse-by-title > feature with an alpha wheel -- by which I mean a row of links labeled A, B, > C, etc. that allow the user to click on a letter and get back all titles > starting with that letter. Under the hood I need to implement this feature as > a search. That is, I don't want to retrieve all titles and then filter out > the titles starting with a particular letter. That won't scale, and it just > seems so inelegant and overwrought. It also won't work with my pagination > code, where I provide links allowing the user to page through the results in > pages of $page-length, because that code relies on the <search:response> > document that search:search() returns (including relying on @total). > > Also, ideally the solution to this problem should include Unicode > normalization, specifically decomposition. Currently we're building a pilot > project, but we expect to have tens of thousands of documents eventually, > some of which might not be in English. I need the search results to include > documents where the first letter of the title starts with a non-ASCII > character, such as a letter with a diacritical mark. Simply put, when the > user clicks the "E" link I need to retrieve titles starting with "E" but also > ones starting with "É" etc. > > In the database config, I've got an element range index on the relevant > element, which in our documents is <sortTitle>, containing the title with > initial articles like a/an/the stripped off. > > My first thought was to modify the XML documents themselves to include an > attribute containing the (Unicode normalized) first letter of <sortTitle>. I > assume that would allow me to set up an attribute index and base my searches > on that, as in search:search("first-letter:A", ...). But I consider that a > last resort; I'd much prefer to handle this within the application rather > than updating the documents. > > I thought that using the * wildcard might work, but I haven't been able to > hit upon the right mix of index(es), word lexicon(s), and database config > settings to make that idea work. > > Thanks in advance for any advice! > Greg > > Gregory Murray > Digital Library Application Developer > Princeton Theological Seminary Library > [email protected] > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
