[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902600#action_12902600
]
Peter Karich edited comment on SOLR-2059 at 8/25/10 3:46 PM:
-------------------------------------------------------------
Ups, my mistake ... this helped!
> What do you think of the file format, is it ok for describing these
> categories?
I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar="@#" which is now more powerful IMHO:
{code}
@ => ALPHA
# => ALPHA
{code}
was (Author: peathal):
Ups, my mistake ... this helped!
> What do you think of the file format, is it ok for describing these
> categories?
I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar="@#" which is now more powerful IMHO:
@ => ALPHA
# => ALPHA
> Allow customizing how WordDelimiterFilter tokenizes text.
> ---------------------------------------------------------
>
> Key: SOLR-2059
> URL: https://issues.apache.org/jira/browse/SOLR-2059
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Robert Muir
> Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2059.patch
>
>
> By default, WordDelimiterFilter assigns 'types' to each character (computed
> from Unicode Properties).
> Based on these types and the options provided, it splits and concatenates
> text.
> In some circumstances, you might need to tweak the behavior of how this works.
> It seems the filter already had this in mind, since you can pass in a custom
> byte[] type table.
> But its not exposed in the factory.
> I think you should be able to customize the defaults with a configuration
> file:
> {noformat}
> # A customized type mapping for WordDelimiterFilterFactory
> # the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM
> #
> # the default for any character without a mapping is always computed from
> # Unicode character properties
> # Map the $, %, '.', and ',' characters to DIGIT
> # This might be useful for financial data.
> $ => DIGIT
> % => DIGIT
> . => DIGIT
> \u002C => DIGIT
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]