[
https://issues.apache.org/jira/browse/SOLR-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716594#action_12716594
]
Michael Ludwig commented on SOLR-1204:
--------------------------------------
Sorry for being inexact in my last comment. I wrote "any valid UTF-8 character"
when I meant "any valid UTF-8 letter", which is what the patch contains, and
which isn't equivalent to what I wrote.
In order to produce a correct patch, I need to know what are legal field names.
It can hardly be "any UTF-8 string" as that will also contain the colon, which
is already used to delimit field names from query strings. What about digits?
Asterisk? Dash (minus)? Underscore? Space? Tabulator?
My original idea was to allow letters, digits and underscore. But this may not
be sufficient.
> Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only
> --------------------------------------------------------------------
>
> Key: SOLR-1204
> URL: https://issues.apache.org/jira/browse/SOLR-1204
> Project: Solr
> Issue Type: Improvement
> Components: spellchecker
> Affects Versions: 1.3
> Reporter: Michael Ludwig
> Assignee: Shalin Shekhar Mangar
> Priority: Trivial
> Fix For: 1.4
>
> Attachments: SpellingQueryConverter.java.diff,
> SpellingQueryConverter.java.diff
>
>
> Solr - User - SpellCheckComponent: queryAnalyzerFieldType
> http://www.nabble.com/SpellCheckComponent%3A-queryAnalyzerFieldType-td23870668.html
> In the above thread, it was suggested to extend the SpellingQueryConverter to
> cover the full UTF-8 range instead of handling US-ASCII only. This might be
> as simple as changing the regular expression used to tokenize the input
> string to accept a sequence of one or more Unicode letters ( \p{L}+ ) instead
> of a sequence of one or more word characters ( \w+ ).
> See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for
> Java regular expression reference.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.