Hi all

I had some requests for Ukrainian module to support hyphenated words 
better in our spellchecker. In general Ukrainian has a lot of special 
words with hyphen which need to be in the dictionary but it also has a 
lot of compound words with hyphen where both parts are just independent 
words (possibly inflected).
So in general I want spellchecker to first check the whole word but if 
it's not in the dictionary split it by hyphen and check if both parts 
are there.
I did some trick to support this in hunspell with BREAK option.

I could not find quickly if it's easy to support in LT so wrote this 
change which seem to work pretty well (albeit needs some more work to 
get better suggestions for misspelled compound words).

As this change touches the core code I wanted to review it here first to 
make sure it's the right way to do it and if anybody objects.

Thanks
Andriy

Index: 
languagetool-core/src/main/java/org/languagetool/rules/spelling/morfologik/MorfologikSpellerRule.java
===================================================================
--- 
languagetool-core/src/main/java/org/languagetool/rules/spelling/morfologik/MorfologikSpellerRule.java
 
(revision 10080)
+++ 
languagetool-core/src/main/java/org/languagetool/rules/spelling/morfologik/MorfologikSpellerRule.java
 
(working copy)
@@ -119,14 +119,19 @@
      return toRuleMatchArray(ruleMatches);
    }

+
+  protected boolean isMisspelled(MorfologikSpeller speller, String word) {
+    return speller.isMisspelled(word);
+  }
+
    private List<RuleMatch> getRuleMatch(final String word, final int 
startPos) {
      final List<RuleMatch> ruleMatches = new ArrayList<RuleMatch>();
-    if (speller.isMisspelled(word)) {
+    if (isMisspelled(speller, word)) {
        final RuleMatch ruleMatch = new RuleMatch(this, startPos, startPos
            + word.length(), messages.getString("spelling"),
            messages.getString("desc_spelling_short"));
        //If lower case word is not a misspelled word, return it as the 
only suggestion
-      if (!speller.isMisspelled(word.toLowerCase(conversionLocale))) {
+      if (!isMisspelled(speller, word.toLowerCase(conversionLocale))) {
          List<String> suggestion = 
Arrays.asList(word.toLowerCase(conversionLocale));
          ruleMatch.setSuggestedReplacements(suggestion);
          ruleMatches.add(ruleMatch);
Index: 
languagetool-language-modules/uk/src/main/java/org/languagetool/rules/uk/MorfologikUkrainianSpellerRule.java
===================================================================
--- 
languagetool-language-modules/uk/src/main/java/org/languagetool/rules/uk/MorfologikUkrainianSpellerRule.java
 
(revision 10080)
+++ 
languagetool-language-modules/uk/src/main/java/org/languagetool/rules/uk/MorfologikUkrainianSpellerRule.java
 
(working copy)
@@ -25,10 +25,12 @@

  import org.languagetool.Language;
  import org.languagetool.rules.spelling.morfologik.MorfologikSpellerRule;
+import org.languagetool.rules.spelling.morfologik.MorfologikSpeller;

  public final class MorfologikUkrainianSpellerRule extends 
MorfologikSpellerRule {

-  private static final String RESOURCE_FILENAME = 
"/uk/hunspell/uk_UA.dict";
+  private static final String COMPOUND_CHAR = "-";
+    private static final String RESOURCE_FILENAME = 
"/uk/hunspell/uk_UA.dict";
    private static final Pattern UKRAINIAN_LETTERS = 
Pattern.compile(".*[а-яіїєґА-ЯІЇЄҐ].*");

    public MorfologikUkrainianSpellerRule(ResourceBundle messages,
@@ -52,5 +54,21 @@
        return ! UKRAINIAN_LETTERS.matcher(word).matches() || 
super.ignoreWord(word);
    }

+  @Override
+  protected boolean isMisspelled(MorfologikSpeller speller, String word) {
+    if( ! super.isMisspelled(speller, word) )
+        return false;
+
+    if( word.contains(COMPOUND_CHAR) ) {
+        String[] words = word.split(COMPOUND_CHAR);
+        for(String singleWord: words) {
+            if( speller.isMisspelled(singleWord) )
+                return true;
+        }
+        return false;
+    }
+
+    return true;
+  }

  }


------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to