Re: Creating CJK bigram tokens with ClassicTokenizer

2018-10-03 Thread Yasufumi Mizoguchi
a/org/apache/lucene/analysis/cjk/CJKBigramFilter.java#L64 ) ClassicTokenizer also adds obsolete TOKEN_TYPES "CJ" to the CJ token and "ALPHANUM" to the Korean alphabet, but both are not targets for CJKBigramFilter... Thanks, Yasufumi 2018年10月2日(火) 0:05 Shawn Heisey : > On 9/3

Re: Creating CJK bigram tokens with ClassicTokenizer

2018-10-01 Thread Shawn Heisey
On 9/30/2018 10:14 PM, Yasufumi Mizoguchi wrote: I am looking for the way to create CJK bigram tokens with ClassicTokenizer. I tried this by using CJKBigramFilter, but it only supports for StandardTokenizer... CJKBigramFilter shouldn't care what tokenizer you're using.  It should work

Creating CJK bigram tokens with ClassicTokenizer

2018-09-30 Thread Yasufumi Mizoguchi
Hi, I am looking for the way to create CJK bigram tokens with ClassicTokenizer. I tried this by using CJKBigramFilter, but it only supports for StandardTokenizer... So, is there any good way to do that? Thanks, Yasufumi

Re: ClassicTokenizer

2018-01-11 Thread Steve Rowe
y have no idea. Those are Lucene classes, not Solr. Maybe someone > who was around for whatever discussions happened on Lucene lists back in > those days will comment. > > I wasn't able to find the issue where ClassicTokenizer was created, and I > couldn't find any informati

Re: ClassicTokenizer

2018-01-10 Thread Shawn Heisey
1 to break on hyphens, when it seems to me to work better the old way? I really have no idea. Those are Lucene classes, not Solr. Maybe someone who was around for whatever discussions happened on Lucene lists back in those days will comment. I wasn't able to find the issue where ClassicTokenizer w

Re: ClassicTokenizer

2018-01-10 Thread Rick Leir
me to work better the old way? Thanks Rick On January 9, 2018 7:07:59 PM EST, Shawn Heisey <apa...@elyograg.org> wrote: >On 1/9/2018 9:36 AM, Rick Leir wrote: >> A while ago the default was changed to StandardTokenizer from >ClassicTokenizer. The biggest difference seems to be

Re: ClassicTokenizer

2018-01-09 Thread Shawn Heisey
On 1/9/2018 9:36 AM, Rick Leir wrote: > A while ago the default was changed to StandardTokenizer from > ClassicTokenizer. The biggest difference seems to be that Classic does not > break on hyphens. There is also a different character pr(mumble). I prefer > the Classic's non-brea

ClassicTokenizer

2018-01-09 Thread Rick Leir
Hi all A while ago the default was changed to StandardTokenizer from ClassicTokenizer. The biggest difference seems to be that Classic does not break on hyphens. There is also a different character pr(mumble). I prefer the Classic's non-break on hyphens. What was the reason for changing

What we loose if we use ClassicTokenizer instead of StandardTokenizer

2012-06-19 Thread Alok Bhandari
Hello, I need to know that if I use ClassicTokenizer instead of StandardTokenizer then what things I will loose. Is it the case that in future solr versions ClassicTokenizer will be deprecated? or development in ClassicTokenizer is going to halt? Please let me know this. -- View this message

Re: What we loose if we use ClassicTokenizer instead of StandardTokenizer

2012-06-19 Thread Erick Erickson
need to know that if I use ClassicTokenizer instead of StandardTokenizer then what things I will loose. Is it the case that in future solr versions ClassicTokenizer will be deprecated? or development in ClassicTokenizer is going to halt? Please let me know this. -- View this message in context

Re: What we loose if we use ClassicTokenizer instead of StandardTokenizer

2012-06-19 Thread Alok Bhandari
-ClassicTokenizer-instead-of-StandardTokenizer-tp3990249p3990278.html Sent from the Solr - User mailing list archive at Nabble.com.