dungba88 commented on issue #12513:
URL: https://github.com/apache/lucene/issues/12513#issuecomment-1857252458

   I'm still consuming this thread, pardon me if I ask something that's already 
discussed.
   
   > Yes, I actually tried to use FSTPostingsFormat in the benchmarks game and 
I had to increase the heap size from 4g to 32g to workaround the in-heap memory 
demand
   
   Are you referring to the `FSTTermsWriter` case for building, as 
`FSTTermsReader` is [already 
off-heap](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java#L196)?
 If so, with the recent change to stream FST to disk while building could help: 
https://github.com/apache/lucene/pull/12624. We can plug the `out` IndexOutput 
already created in `FSTTermsWriter` to the FieldMetadata.FSTCompiler. A catch 
is that we need another IndexOutput for storing the FST metadata, as it's not 
possible to write the metadata into the same IndexOutput as the main FST body :(
   
   I think we can even do this as a separate PR? I could look into it as part 
of https://github.com/apache/lucene/issues/12902. Let me know if there are any 
other place should be doing this.
   
   > Yes, this can be very promising :) The fact that it is FST and contains 
all terms makes it efficient to skip no-existent terms.
   
   Sound exciting. I could imagine we can drop an entire clause with a single 
FST look-up?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to