issue with singleton analyzer in single JVM multi-index setup

Dmitry Kan Tue, 17 Mar 2015 10:07:25 -0700

Hello!

I'm a newbie in elasticsearch, so forgive if the question is lame.

I have implemented a custom plugin using a custom lemmatizer and a
tokenizer. The simplified class sequence:

AnalysisMorphologyPlugin->MorphologyAnalysisBinderProcessor->SemanticAnalyzerTwitterLemmatizerProvider->RussianLemmatizingTwitterAnalyzer

In the RussianLemmatizingTwitterAnalyzer's ctor I load the custom object for
lemmatization (object unrelated to lucene/es) in a singleton fashion (in a
syncrhonized code block).
Then, when creating 14 indices in the same JVM I see
14 instances of RussianLemmatizingTwitterAnalyzer,
4 instances of SemanticAnalyzerTwitterLemmatizerProvider,
4 instances of MorphologyAnalysisBinderProcessor,
30 instances of the custom lemmatizer (in each
RussianLemmatizingTwitterAnalyzer only one instance is expected, so should be
14),
1 instance of AnalysisMorphologyPlugin.

The question is, can RussianLemmatizingTwitterAnalyzer object be made shared
between indices? Or is it by design, that they must load separately per index?
What could be wrong in the code that makes 30 instances of the custom singleton
lemmatizer instead of 14?

The current standing is that *with* the plugin 100M of RAM is reserved by the
JVM with no data. *Without* the plugin the JVM reserves 2M with no data.
Elasticsearch 1.3.2, Lucene 4.9.0.

Regards,

Dmitry Kan

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7e0b09a0-c88c-4c56-bc8f-1b895d534cc0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

issue with singleton analyzer in single JVM multi-index setup

Reply via email to