[
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492132
]
Ryan McKinley commented on SOLR-211:
------------------------------------
>
> I don't know if your new PatternTokenizerFactory could replace either of
> these, though. For the first case, I still want the white space tokenization
> after I've stripped off all the junk I don't want. And for the second, I need
> to be able to do the remapping.
>
If your really good with regular expressions, perhaps it could all be
combined... I'm not ;)
In my real use case, I use the general PatternTokenizerFactory to split the
input into a bunch of tokens, then I have a custom (ugly!) TokenFilter
transform the stream with other one-off transformations similar to what you
describe.
> regex split() Tokenizer
> -----------------------
>
> Key: SOLR-211
> URL: https://issues.apache.org/jira/browse/SOLR-211
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Ryan McKinley
> Assigned To: Ryan McKinley
> Attachments: SOLR-211-RegexSplitTokenizer.patch,
> SOLR-211-RegexSplitTokenizer.patch, SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
> string.split( regex );
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.