subject:"Why is Standard Tokenizer not separating at this comma\?"

Re: [Simplified my question] How to enhance solr.StandardTokenizerFactory? (was: Why is Standard Tokenizer not separating at this comma?)

2017-05-24 Thread Steve Rowe

Hi Robert, Two possibilities come to mind: 1. Use a char filter factory (runs before the tokenizer) to convert commas between digits to spaces, e.g. PatternReplaceCharFilterFactory

Re: Why is Standard Tokenizer not separating at this comma?

2017-05-24 Thread Steve Rowe

Hi Robert, The StandardTokenizer implements the word boundaries rules from UAX#29 , discarding anything between boundaries that is exclusively non-alphanumeric (e.g. punctuation). -- Steve www.lucidworks.com > On May 24, 2017, at 3:05 PM,

[Simplified my question] How to enhance solr.StandardTokenizerFactory? (was: Why is Standard Tokenizer not separating at this comma?)

2017-05-24 Thread Robert Hume

Hi, Following up on my last email question ... I've learned more and I simplified by question ... I have a Solr 3.6 deployment. Currently I'm using solr.StandardTokenizerFactory to parse tokens during indexing. Here's two example streams that demonstrate my issue: Example 1:

Why is Standard Tokenizer not separating at this comma?

2017-05-24 Thread Robert Hume

I have a Solr 3.6 deployment I inherited. The schema.xml specifies the use of StandardTokenizerFactory like so ... ... ... According to this reference guide ( https://home.apache.org/~ctargett/RefGuidePOC/jekyll/Tokenizers.html) ... the StandardTokenizer will treat