improve arabic analyzer: light8 -> light10
------------------------------------------

                 Key: LUCENE-1758
                 URL: https://issues.apache.org/jira/browse/LUCENE-1758
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/analyzers
            Reporter: Robert Muir
            Priority: Minor
         Attachments: LUCENE-1758.txt

Someone mentioned on the java user list that the arabic analysis was not as 
good as they would like.

This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
In the light10 paper, this improves precision from .390 to .413
They mention this is not statistically significant, but it makes linguistic 
sense and at least has been shown not to hurt.

In the future, I hope openrelevance will allow us to try some more approaches. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to