Cache (serialized) tokensteam?

Omri Mon, 22 Nov 2021 00:41:00 -0800

We are indexing a lot of similar texts using Lucene analyzers.
>From our performance tests we see that the analyzation (converting the text 
>the tokensteam object) is talking more time that we want.
Before digging into the analyzation code, I was thinking about caching the 
analyzation result since we have many repeated texts that we index in different 
times.
The basic idea is to serialize the tokenstream and store it in a DB. when we 
encounter the same text, to load it and initialize an analyzer with the loaded 
tokenstream.
In this context:
1 - is it "safe" to serialize the tokenstream?
2 - there is an existing code that already serialize a tokenstream?
3 - how to initialize an existing analyzer with a tokenstream?


Thanks!

Best,
Omri
The contents of this e-mail message and any attachments are confidential and 
are intended solely for addressee. The information may also be legally 
privileged. This transmission is sent in trust, for the sole purpose of 
delivery to the intended recipient. If you have received this transmission in 
error, any use, reproduction or dissemination of this transmission is strictly 
prohibited. If you are not the intended recipient, please immediately notify 
the sender by reply e-mail or phone and delete this message and its 
attachments, if any.

Cache (serialized) tokensteam?

Reply via email to