Hi,
reading information from the inverted index (and also vectors) is always
slow, because the data is not stored "as is" for easy reconsumption. To
allow easy reindexing, there input data must be serialized to a "stored"
field in parallel to the indexed value.
Elasticearch is using the approach to have a single/separate "stored
only" binary field in the index that contains the "_source" data of the
whole document as machine readable JSON/CBOR/SMILE format. When a
document is updated in index, the updater reads the original source,
applies updates to it and then reindexes the document. All other fields
in Elasticsearch are not stored (unless you explicitely to opt-in for
that).
In Solr it is very similar, but there are the stored values serialized
to companion fields with same name. But there is currently no separate
Lucene StoredField implementation in to store vectors. But it's easy to
do: You could use a binary (byte[]) stored field to preserve the vector
data (e.g., serialized in little/big endian).
I tend to favour the Elasticsearch approach to have a single stored
field containing the whole document in machine readable from.
Uwe
Am 11.02.2024 um 13:39 schrieb Uthra:
Hi Michael,
The use case is to handle index updates along with its vector field
without resending the vector in change data every time. The change data will
consist of only “updated_field(s):value(s)” wherein I will read the vector
value from Index to update the document.
Thanks,
Uthra
On 09-Feb-2024, at 7:13 PM, Michael Wechner <michael.wech...@wyona.com> wrote:
Can you describe your use case in more detail (beyond having to read the
vectors)?
Thanks
Michael
Am 09.02.24 um 12:28 schrieb Uthra:
Hi,
Our project uses Lucene 9_7_0 and we have a requirement of frequent
vector read operation from the Index for a set of documents. We tried two
approaches
1. Index vector as Stored field and retrieve whenever needed using StoredFields
APIs.
2. Using LeafReader’s API to read vector. Here the Random accessing of
documents is very slow.
Which one is the right approach and can you suggest me a better approach.Also
why isn’t there a straightforward API like the StoredFields API to read vector.
Regards,
Uthra
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org