Hi Folks,

 

I'm not up to speed on some of the latest innovations and practices in MarkLogic
and rather than dig and try to figure it all out I figured I'd save a little
time and ask.  Here's what I would really like to do:

 

Given mixed content such as:


<full-name> <surname>James</surname>, <first-name>John</first-name></full-name>

 

I would like to create a range index on full-name, surname, and first-name
without having to create a separate full-name element that contains no
sub-elements.  That way I can have and do the following things:



1.       Obtain a searchable lexicon of full-names for search purposes.

 

2.       Provide a master database and schema from which derivative documents
can be extracted uses less granular elements, in this case <full-name> without
<surname> and <first-name>, such that I can use an element range index on
<full-name> in the master database to analyze and/or normalize any and all
variations of <full-name>.

 

3.       I can also further analyze aberrant <full-name> forms to develop
enhanced parsing algorithms to obtain the surname and first-name (and for that
matter middle names, prefixes, and suffixes as would have it in a real world
scenario and not as in this limited example). 

 

Thanks for any help with this!

Tim Meagher



_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to