[
https://issues.apache.org/jira/browse/LUCENE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184601#comment-17184601
]
Michael McCandless commented on LUCENE-9481:
--------------------------------------------
Wow, that is a fascinating result!
Maybe, instead of a paged {{byte[]}} storage, we should have a store that just
grows a single {{byte[]}} (e.g. using {{ArrayUtil.grow}}) up until reasonable
limits? Then the resulting FST could use the same {{ReversedBytesReader}}?
> Improve FST performance created by Builder
> ------------------------------------------
>
> Key: LUCENE-9481
> URL: https://issues.apache.org/jira/browse/LUCENE-9481
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/FSTs
> Reporter: Anh Dung Bui
> Priority: Major
>
> Hi,
>
> We are using Lucene 8 and recently we observed a performance issue with FST.
> When FST is used directly from *org.apache.lucene.util.fst.Builder.finish()*,
> its about 40-50% slower than when it's first saved to a *DataOutput* (e.g
> filesystem or heap) and loaded again. The FST we used is about 1.6MB.
>
> Using VisualVM, we tracked the issue down to the *DataInput.readVInt* method,
> which in turn calls the readByte of the reversed BytesReader. When FST is
> loaded from filesystem/heap, it's using OnHeapFSTStore which use
> ReverseBytesReader, but when it's created by the Builder, it's backed by the
> BytesStore. The problem is, when the number of blocks of BytesStore is not 1,
> it'll use an [anonymous
> class|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java#L453]
> for reading bytes, and possibly because of JVM inlining, the
> ReverseBytesReader.readByte is inlined, while that of the anonymous class is
> not. We confirmed this by adding XX:+PrintInlining flag.
> {code:java}
> @ 80 org.apache.lucene.util.fst.BytesStore$2::readByte (68 bytes) too
> big{code}
> {code:java}
> @ 17 org.apache.lucene.util.fst.ReverseBytesReader::readByte (17 bytes)
> inline (hot){code}
>
> Currently we have 2 workarounds, which can also confirm the above theory:
> * Use a larger *bytesPageBits* in Builder, so that the block size is greater
> than the FST size.
> * Save and load the FST to heap after building, e.g using
> GrowableByteArrayDataOutput and ByteArrayDataInput.
> But I think it would be better if this optimization is done inside
> *Builder.finish()* instead, e.g by auto-adjust the block size of BytesStore
> or copy it to a FSTStore if possible, since none of the 2 workarounds above
> look great. We have many FST with largely different sizes, so with the first
> option we need to maintain the bytesPageBits for each dictionary, otherwise
> we risk wasting unnecessary heap or using insufficient block size. I also
> think this issue would affect other Lucene users too.
>
> For reference, the benchmark we setup is as below:
> * Warm up both FST for 10.000 iterations (10.000 seems to be the JVM
> inlining trigger threshold)
> * Run 100 tests, each test run 100.000 iterations
> * Each iteration is simply call *org.apache.lucene.util.fst.Util.get()* with
> the same 4-character word.
> It reports that 98 out of 100 tests, the one with save/load logic is 40-50%
> faster on average.
> Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]