[ 
https://issues.apache.org/jira/browse/LUCENE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184601#comment-17184601
 ] 

Michael McCandless commented on LUCENE-9481:
--------------------------------------------

Wow, that is a fascinating result!

Maybe, instead of a paged {{byte[]}} storage, we should have a store that just 
grows a single {{byte[]}} (e.g. using {{ArrayUtil.grow}}) up until reasonable 
limits?  Then the resulting FST could use the same {{ReversedBytesReader}}?

> Improve FST performance created by Builder
> ------------------------------------------
>
>                 Key: LUCENE-9481
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9481
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Anh Dung Bui
>            Priority: Major
>
> Hi,
>  
> We are using Lucene 8 and recently we observed a performance issue with FST. 
> When FST is used directly from *org.apache.lucene.util.fst.Builder.finish()*, 
> its about 40-50% slower than when it's first saved to a *DataOutput* (e.g 
> filesystem or heap) and loaded again. The FST we used is about 1.6MB.
>  
> Using VisualVM, we tracked the issue down to the *DataInput.readVInt* method, 
> which in turn calls the readByte of the reversed BytesReader. When FST is 
> loaded from filesystem/heap, it's using OnHeapFSTStore which use 
> ReverseBytesReader, but when it's created by the Builder, it's backed by the 
> BytesStore. The problem is, when the number of blocks of BytesStore is not 1, 
> it'll use an [anonymous 
> class|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java#L453]
>  for reading bytes, and possibly because of JVM inlining, the 
> ReverseBytesReader.readByte is inlined, while that of the anonymous class is 
> not. We confirmed this by adding XX:+PrintInlining flag.
> {code:java}
> @ 80 org.apache.lucene.util.fst.BytesStore$2::readByte (68 bytes) too 
> big{code}
> {code:java}
> @ 17 org.apache.lucene.util.fst.ReverseBytesReader::readByte (17 bytes) 
> inline (hot){code}
>  
> Currently we have 2 workarounds, which can also confirm the above theory:
>  * Use a larger *bytesPageBits* in Builder, so that the block size is greater 
> than the FST size.
>  * Save and load the FST to heap after building, e.g using 
> GrowableByteArrayDataOutput and ByteArrayDataInput.
> But I think it would be better if this optimization is done inside 
> *Builder.finish()* instead, e.g by auto-adjust the block size of BytesStore 
> or copy it to a FSTStore if possible, since none of the 2 workarounds above 
> look great. We have many FST with largely different sizes, so with the first 
> option we need to maintain the bytesPageBits for each dictionary, otherwise 
> we risk wasting unnecessary heap or using insufficient block size. I also 
> think this issue would affect other Lucene users too.
>  
> For reference, the benchmark we setup is as below:
>  * Warm up both FST for 10.000 iterations (10.000 seems to be the JVM 
> inlining trigger threshold)
>  * Run 100 tests, each test run 100.000 iterations
>  * Each iteration is simply call *org.apache.lucene.util.fst.Util.get()* with 
> the same 4-character word.
> It reports that 98 out of 100 tests, the one with save/load logic is 40-50% 
> faster on average.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to