Re: [I] Try out a tantivy's term dictionary format [lucene]

via GitHub Thu, 14 Dec 2023 20:20:32 -0800


dungba88 commented on issue #12513:
URL: https://github.com/apache/lucene/issues/12513#issuecomment-1857252458

I'm still consuming this thread, pardon me if I ask something that's already
discussed.

> Yes, I actually tried to use FSTPostingsFormat in the benchmarks game and
I had to increase the heap size from 4g to 32g to workaround the in-heap memory
demand

Are you referring to the `FSTTermsWriter` case for building, as
`FSTTermsReader` is [already
off-heap](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java#L196)?
If so, with the recent change to stream FST to disk while building could help:
https://github.com/apache/lucene/pull/12624. We can plug the `out` IndexOutput
already created in `FSTTermsWriter` to the FieldMetadata.FSTCompiler. A catch
is that we need another IndexOutput for storing the FST metadata, as it's not
possible to write the metadata into the same IndexOutput as the main FST body :(

I think we can even do this as a separate PR? I could look into it as part
of https://github.com/apache/lucene/issues/12902. Let me know if there are any
other place should be doing this.

> Yes, this can be very promising :) The fact that it is FST and contains
all terms makes it efficient to skip no-existent terms.

Sound exciting. I could imagine we can drop an entire clause with a single
FST look-up?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Try out a tantivy's term dictionary format [lucene]

Reply via email to