That is a good idea. Unfortunately I couldn't get it to work. I added a custom 
dictionary entry for "judgement", followed by a reindex. Maybe you can't do 
this with stemming? The documentation suggests you can, giving as examples 
"aluminum" and "aluminium," but there's no code example.

cdict:dictionary-read("en") =>
<cdict:dictionary xmlns:cdict="http://marklogic.com/xdmp/custom-dictionary"; 
xml:lang="en">
  <cdict:entry>
    <cdict:word>Judgement</cdict:word>
    <cdict:stem>Judgment</cdict:stem>
    <cdict:pos>Nn</cdict:pos>
  </cdict:entry>
</cdict:dictionary>

xdmp:estimate(cts:search(//document,cts:word-query("judgement","case-insensitive"))
 => 0
xdmp:estimate(cts:search(//document,cts:word-query("judgment","case-insensitive"))
 => 3220

-Will

From: [email protected] 
[mailto:[email protected]] On Behalf Of Harry B.
Sent: Wednesday, July 18, 2012 4:48 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] spell:suggest behavior


Interesting caveat. I guess that's why spell check didn't scream at me. I 
wonder if language-specific stemming could handle this? (specifying en-us) This 
is similar to color vs. colour in my mind...relying on stemming, you could get 
results with both spellings.
On Jul 18, 2012 5:31 PM, "Will Thompson" 
<[email protected]<mailto:[email protected]>> wrote:
Harry - I think in the UK the extra "e" is correct, but "judgment" is the 
correct spelling in the US. This is really the only word giving us trouble (our 
content is law-related), and I have some simple logic in place to check and 
provide the correct suggestion, but I thought there might be a better way.  
Thank you for this info!

-Will

From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Harry B.
Sent: Wednesday, July 18, 2012 12:52 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] spell:suggest behavior

Are you sure you have these the right way around? Judgement is spelled 
correctly...

It looks like the double-metaphones may be giving more weight to the 
suggestions than you'd want in this case. The word distances and  Levenshtein 
distances are higher for the suggestions than for judgment. I don't know of a 
way around this sort of thing. I've run across words from time to time where 
the suggestions aren't what I'd expect or not available. In my test dictionary, 
taking the other words out so that judgement was the only word in the 
dictionary still didn't correct judgment to judgement. I think this is a word 
where you'll have to have other logic that can catch this specific misspelling 
(before using the dictionary to check spelling, look at another list and see if 
the word is there). This should be able to be done in a performant way. In 
fact, regular expressions run extremely fast and you could have a list of words 
you come across like this that need forced suggestions.

spell:double-metaphone("judgment") => jtkmnt atkmnt
spell:double-metaphone("judgement") => jjmnt ajmnt

spell:double-metaphone("augment") => akmnt
spell:double-metaphone("oddment") => akmnt
spell:double-metaphone("element") => almnt


On Wed, Jul 18, 2012 at 12:55 PM, Will Thompson 
<[email protected]<mailto:[email protected]>> wrote:
Is there a way to force different behavior of spell:suggest()? For example, 
although the correct spelling, "judgment," is in the dictionary, these are the 
suggestions for the most common misspelling:

spell:suggest("jmp-dictionary.xml", "judgement")
=> augment element oddment Sagemont easement regiment

As far as I can tell, this is not correctable with a custom dictionary.

-Will
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to