Re: issue with singleton analyzer in single JVM multi-index setup

[email protected] Wed, 18 Mar 2015 09:23:13 -0700

Is it possible to examine the code of your plugin?

Generally speaking, analyzers are instantiated per index creation for each
thread.


In org.elasticsearch.index.analysis.AnalysisModule, you can see how
analyzer providers and factories are prepared for injection by the help of
the ES injection modul which is based on Guice. Basically, the factories
are kept as singletons, and each thread can pick analyzer instances from
the factory when needed. All in all, Lucene analyzer classes are not
threadsafe, in particular the tokenizers. It means, it is up to the
implementor of an analyzer/tokenizer to store immutable objects as
singletons in a correct way so that all threads can safely access them.

Jörg

On Wed, Mar 18, 2015 at 4:02 PM, Dmitry Kan <[email protected]> wrote:

> Hi,
>
> Could somebody answer, please?
>
>
> On Tuesday, 17 March 2015 19:05:38 UTC+2, Dmitry Kan wrote:
>>
>> Hello!
>>
>> I'm a newbie in elasticsearch, so forgive if the question is lame.
>>
>> I have implemented a custom plugin using a custom lemmatizer and a
>> tokenizer. The simplified class sequence:
>>
>>
>> AnalysisMorphologyPlugin->MorphologyAnalysisBinderProcessor->SemanticAnalyzerTwitterLemmatizerProvider->RussianLemmatizingTwitterAnalyzer
>>
>> In the RussianLemmatizingTwitterAnalyzer's ctor I load the custom object for 
>> lemmatization (object unrelated to lucene/es) in a singleton fashion (in a 
>> syncrhonized code block).
>> Then, when creating 14 indices in the same JVM I see
>>  14 instances of RussianLemmatizingTwitterAnalyzer,
>>  4 instances of SemanticAnalyzerTwitterLemmatizerProvider,
>>  4 instances of MorphologyAnalysisBinderProcessor,
>>  30 instances of the custom lemmatizer (in each 
>> RussianLemmatizingTwitterAnalyzer only one instance is expected, so should 
>> be 14),
>>  1 instance of AnalysisMorphologyPlugin.
>>
>> The question is, can RussianLemmatizingTwitterAnalyzer object be made shared 
>> between indices? Or is it by design, that they must load separately per 
>> index?
>> What could be wrong in the code that makes 30 instances of the custom 
>> singleton lemmatizer instead of 14?
>>
>> The current standing is that *with* the plugin 100M of RAM is reserved by 
>> the JVM with no data. *Without* the plugin the JVM reserves 2M with no data. 
>> Elasticsearch 1.3.2, Lucene 4.9.0.
>>
>> Regards,
>>
>> Dmitry Kan
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c2c57184-ee3b-4600-9091-a515b496b867%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/c2c57184-ee3b-4600-9091-a515b496b867%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGJp6MkhrHCuHiJp%2Bd55cPWZ0bsMNCj0pnS13oGKtdoxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: issue with singleton analyzer in single JVM multi-index setup

Reply via email to