Re: Two questions on RussianAnalyzer

Vladimir Gubarkov Thu, 19 Apr 2012 13:51:58 -0700

Thank you Robert for detailed reply

On Fri, Apr 20, 2012 at 12:37 AM, Robert Muir <[email protected]> wrote:
> On Thu, Apr 19, 2012 at 7:26 AM, Vladimir Gubarkov <[email protected]> wrote:
>> New analyzer:
>> [aaa.bbb.com, 8888, a, b, c, d'e, f, g, h, i, j, k, l_m, n, o, p, q,
>> r, s, t, u, v, z, y, z]
>> Old analyzer:
>> [aaa, bbb, com, 8888, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p,
>> q, r, s, t, u, v, z, y, z]
>>
>> Please note the differences.
>
> Right, the tokenizer has changed. This is mentioned in the javadocs:
> http://lucene.apache.org/core/3_6_0/api/contrib-analyzers/org/apache/lucene/analysis/ru/RussianAnalyzer.html
>
>>
>> The most uncomfortable in new behaviour to me is that in past I used
>> to search by subdomain like
>> bbb.com:8888
>> and have displayed results with www.bbb.com:8888, aaa.bbb.com:8888 and
>> so on. Now I have 0 results.
>
> Don't simply set your version parameter to 3.6 without reindexing.
> This is really important!!!!!!!!!!!
> Otherwise it defeats the whole purpose.
>


Hmmm... I know this and I reindexed!
I'll try to explain the problem (fortunately, already solved by using
LUCENE_30) ones again:
When indexing with new analyzer the whole lexeme "some.cool.site.com"
goes to index, not 4 lexems "some", "cool", "site", "com".
So it's now imposible to find this document with query: "site.com".
I'm having an RSS subscription for that search, and now it's broken.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Two questions on RussianAnalyzer

Reply via email to