Hi,

reading information from the inverted index (and also vectors) is always slow, because the data is not stored "as is" for easy reconsumption. To allow easy reindexing, there input data must be serialized to a "stored" field in parallel to the indexed value.

Elasticearch is using the approach to have a single/separate "stored only" binary field in the index that contains the "_source" data of the whole document as machine readable JSON/CBOR/SMILE format. When a document is updated in index, the updater reads the original source, applies updates to it and then reindexes the document. All other fields in Elasticsearch are not stored (unless you explicitely to opt-in for that).

In Solr it is very similar, but there are the stored values serialized to companion fields with same name. But there is currently no separate Lucene StoredField implementation in to store vectors. But it's easy to do: You could use a binary (byte[]) stored field to preserve the vector data (e.g., serialized in little/big endian).

I tend to favour the Elasticsearch approach to have a single stored field containing the whole document in machine readable from.

Uwe

Am 11.02.2024 um 13:39 schrieb Uthra:
Hi Michael,
        The use case is to handle index updates along with its vector field 
without resending the vector in change data every time. The change data will 
consist of only “updated_field(s):value(s)” wherein I will read the vector 
value from Index to update the document.

Thanks,
Uthra

On 09-Feb-2024, at 7:13 PM, Michael Wechner <michael.wech...@wyona.com> wrote:

Can you describe your use case in more detail (beyond having to read the 
vectors)?

Thanks

Michael

Am 09.02.24 um 12:28 schrieb Uthra:
Hi,
        Our project uses Lucene 9_7_0 and we have a requirement of frequent 
vector read operation from the Index for a set of documents. We tried two 
approaches
1. Index vector as Stored field and retrieve whenever needed using StoredFields 
APIs.
2. Using LeafReader’s API to read vector. Here the Random accessing of 
documents is very slow.
Which one is the right approach and can you suggest me a better approach.Also 
why isn’t there a straightforward API like the StoredFields API to read vector.

Regards,
Uthra

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to