Thanks Mary. Your suggestion worked. Just to close the loop on this topic, I did create a field and then used the "override tokenization" option on the field to make the apostrophe part of the word. With that field in place I can use a custom dictionary to make "Int'l" stem to "International". A search for the word "Int'l" converts the search into its stem "International" and returns documents containing "International" and documents containing "Int'l" (because those docs have been indexed under their stem: "International").
Thanks, David -----Original Message----- From: Rhodes, David (LNG-CON) Sent: Thursday, July 23, 2015 9:09 AM To: MarkLogic Developer Discussion Subject: RE: [MarkLogic Dev General] Custom dictionary for stemming Thank you Mary. You are right about the punctuation creating the problem. Words and stems without punctuation work just fine with my new custom dictionary. We have a long list of industry-specific abbreviations and synonyms. I am experimenting with using stemming (instead of a thesaurus) to make searches return the same results regardless of whether the user searches for the full word or the abbreviation. Most of these abbreviations contain punctuation (either a period or an apostrophe). All these values are in the same field. So I'll take your advice and investigate creating a custom tokenization for that field. Thank you, David -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mary Holstege Sent: Wednesday, July 22, 2015 11:57 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Custom dictionary for stemming It may be a tokenization thing -- the apostrophe is causing a word break so your custom stem is never matched. What does this give you: cts:tokenize(cts:stem("Int'l"))? Do things work as you expect for a custom stem that doesn't have a punctuation character in it? A workaround for that is to create a field custom tokenization override making apostrophe a word character. That will be confined to that specific field, however, and not to word queries in general. Regardless, you should probably report a bug to ML support. //Mary On Wed, 22 Jul 2015 08:02:33 -0700, Rhodes, David (LNG-CON) <[email protected]> wrote: > I am trying to use a custom dictionary to extend the set of stemmed > words. > > I am using MarkLogic 7.0, and have been following the documentation > guides in Chapters 17 and 18: > http://docs.marklogic.com/7.0/guide/search-dev/stemming > http://docs.marklogic.com/7.0/guide/search-dev/custom-dictionaries > > I noted that there are two ways to see if words are resolving to their > stems: > > cts:stem(word) returns the stems of word > > and > > cts:contains(word, stem) returns true if these two terms resolve to > the same stem > > I confirmed that both of these work for terms that are in the default > dictionary (e.g., run and running, bite and bitten) > > I have added a custom dictionary that adds "Int'l" as a word with > "International" as its stem. > > cdict:dictionary-write("en",$dict) > > With that dictionary added as the custom dictionary for English, > cts:stem works but cts:contains does not. > cts:stem("Int'l") returns International cts:contains("Int'l", > "International") returns false > > I reindexed my database, since I understand that my dictionary entry > means that all documents containing "Int'l" should now be indexed > under "International". > > cts:contains("Int'l", "International") still returns false > Furthermore, in the real search work flow that I am doing, searches > for "Int'l" do not return documents containing "International" (But > searches for "bitten" do return documents containing "bite"). > > My database indexes are set to Stemmed Searches = Basic, and Word > Searches = False. > > I think that stemming can be a powerful feature for my work flow, if I > can just get it to work. Thank you for any advice you can offer. > > David -- Using Opera's revolutionary email client: http://www.opera.com/mail/ _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
