dungba88 commented on issue #12513: URL: https://github.com/apache/lucene/issues/12513#issuecomment-1857252458
I'm still consuming this thread, pardon me if I ask something that's already discussed. > Yes, I actually tried to use FSTPostingsFormat in the benchmarks game and I had to increase the heap size from 4g to 32g to workaround the in-heap memory demand Are you referring to the `FSTTermsWriter` case for building, as `FSTTermsReader` is [already off-heap](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java#L196)? If so, with the recent change to stream FST to disk while building could help: https://github.com/apache/lucene/pull/12624. We can plug the `out` IndexOutput already created in `FSTTermsWriter` to the FieldMetadata.FSTCompiler. A catch is that we need another IndexOutput for storing the FST metadata, as it's not possible to write the metadata into the same IndexOutput as the main FST body :( I think we can even do this as a separate PR? I could look into it as part of https://github.com/apache/lucene/issues/12902. Let me know if there are any other place should be doing this. > Yes, this can be very promising :) The fact that it is FST and contains all terms makes it efficient to skip no-existent terms. Sound exciting. I could imagine we can drop an entire clause with a single FST look-up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org