[ 
https://issues.apache.org/jira/browse/LUCENE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anh Dung Bui updated LUCENE-9481:
---------------------------------
    Description: 
Hi,

 

We are using Lucene 8 and recently we observed a performance issue with FST. 
When FST is used directly from *org.apache.lucene.util.fst.Builder.finish()*, 
its about 40-50% slower than when it's first saved to a *DataOutput* (e.g 
filesystem or heap) and loaded again. The FST we used is about 1.6MB.

 

Using VisualVM, we tracked the issue down to the *DataInput.readVInt* method, 
which in turn calls the readByte of the reversed BytesReader. When FST is 
loaded from filesystem/heap, it's using OnHeapFSTStore which use 
ReverseBytesReader, but when it's created by the Builder, it's backed by the 
BytesStore. The problem is, when the block size of BytesStore is not 1, it'll 
use an [anonymous 
class|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java#L453]
 for reading bytes, and possibly because of JVM inlining, the 
ReverseBytesReader.readByte is inlined, while that of the anonymous class is 
not. We confirmed this by adding XX:+PrintInlining flag.
{code:java}
@ 80 org.apache.lucene.util.fst.BytesStore$2::readByte (68 bytes) too big{code}
{code:java}
@ 17 org.apache.lucene.util.fst.ReverseBytesReader::readByte (17 bytes) inline 
(hot){code}
 

Currently we have 2 workarounds, which can also confirm the above theory:
 * Use a larger *bytesPageBits* in Builder, so that the block size is greater 
than the FST size.
 * Save and load the FST to heap after building, e.g using 
GrowableByteArrayDataOutput and ByteArrayDataInput.

But I think it would be better if this optimization is done inside 
*Builder.finish()* instead, e.g by auto-adjust the block size of BytesStore or 
copy it to a FSTStore if possible, since none of the 2 workarounds above look 
great. We have many FST with largely different sizes, so with the first option 
we need to maintain the bytesPageBits for each dictionary, otherwise we risk 
wasting unnecessary heap or using insufficient block size. I also think this 
issue would affect other Lucene users too.

 

For reference, the benchmark we setup is as below:
 * Warm up both FST for 10.000 iterations (10.000 seems to be the JVM inlining 
trigger threshold)
 * Run 100 tests, each test run 100.000 iterations
 * Each iteration is simply call *org.apache.lucene.util.fst.Util.get()* with 
the same 4-character word.

It reports that 98 out of 100 tests, the one with save/load logic is 40-50% 
faster on average.

Thanks

  was:
Hi,

 

We are using Lucene 8 and recently we observed a performance issue with FST. 
When FST is used directly from *org.apache.lucene.util.fst.Builder.finish()*, 
its about 40-50% slower than when it's first saved to a *DataOutput* (e.g 
filesystem or heap) and loaded again. The FST we used is about 1.6MB.

 

Using VisualVM, we tracked the issue down to the *DataInput.readVInt* method, 
which in turn calls the readByte of the reversed BytesReader. When FST is 
loaded from filesystem/heap, it's using OnHeapFSTStore which use 
ReverseBytesReader, but when it's created by the Builder, it's backed by the 
BytesStore. The problem is, when the block size of BytesStore is not 1, it'll 
use an [anonymous 
class|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java#L453]
 for reading bytes, and possibly because of JVM inlining, the 
ReverseBytesReader.readByte is inlined, while that of the anonymous class is 
not. We confirmed this by adding XX:+PrintInlining flag.
{code:java}
@ 80 org.apache.lucene.util.fst.BytesStore$2::readByte (68 bytes) too big{code}
{code:java}
@ 17 org.apache.lucene.util.fst.ReverseBytesReader::readByte (17 bytes) inline 
(hot){code}
 

Currently we have 2 workarounds, which can also confirm the above theory:
 * Use a larger *bytesPageBits* in Builder, so that the block size is greater 
than the FST size.
 * Save and load the FST to heap after building, e.g using 
GrowableByteArrayDataOutput and ByteArrayDataInput.

But I think this work should be done inside *Builder.finish()* instead, e.g by 
auto-adjust the block size of BytesStore or copy it to a FSTStore if possible, 
since none of the 2 workarounds above look great. We have many FST with largely 
different sizes, so with the first option we need to maintain the bytesPageBits 
for each dictionary, otherwise we risk wasting unnecessary heap or using 
insufficient block size. I also think this issue would affect other Lucene 
users too.

 

For reference, the benchmark we setup is as below:
 * Warm up both FST for 10.000 iterations (10.000 seems to be the JVM inlining 
trigger threshold)
 * Run 100 tests, each test run 100.000 iterations
 * Each iteration is simply call *org.apache.lucene.util.fst.Util.get()* with 
the same 4-character word.

It reports that 98 out of 100 tests, the one with save/load logic is 40-50% 
faster on average.

Thanks


> Improve FST performance created by Builder
> ------------------------------------------
>
>                 Key: LUCENE-9481
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9481
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Anh Dung Bui
>            Priority: Major
>
> Hi,
>  
> We are using Lucene 8 and recently we observed a performance issue with FST. 
> When FST is used directly from *org.apache.lucene.util.fst.Builder.finish()*, 
> its about 40-50% slower than when it's first saved to a *DataOutput* (e.g 
> filesystem or heap) and loaded again. The FST we used is about 1.6MB.
>  
> Using VisualVM, we tracked the issue down to the *DataInput.readVInt* method, 
> which in turn calls the readByte of the reversed BytesReader. When FST is 
> loaded from filesystem/heap, it's using OnHeapFSTStore which use 
> ReverseBytesReader, but when it's created by the Builder, it's backed by the 
> BytesStore. The problem is, when the block size of BytesStore is not 1, it'll 
> use an [anonymous 
> class|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java#L453]
>  for reading bytes, and possibly because of JVM inlining, the 
> ReverseBytesReader.readByte is inlined, while that of the anonymous class is 
> not. We confirmed this by adding XX:+PrintInlining flag.
> {code:java}
> @ 80 org.apache.lucene.util.fst.BytesStore$2::readByte (68 bytes) too 
> big{code}
> {code:java}
> @ 17 org.apache.lucene.util.fst.ReverseBytesReader::readByte (17 bytes) 
> inline (hot){code}
>  
> Currently we have 2 workarounds, which can also confirm the above theory:
>  * Use a larger *bytesPageBits* in Builder, so that the block size is greater 
> than the FST size.
>  * Save and load the FST to heap after building, e.g using 
> GrowableByteArrayDataOutput and ByteArrayDataInput.
> But I think it would be better if this optimization is done inside 
> *Builder.finish()* instead, e.g by auto-adjust the block size of BytesStore 
> or copy it to a FSTStore if possible, since none of the 2 workarounds above 
> look great. We have many FST with largely different sizes, so with the first 
> option we need to maintain the bytesPageBits for each dictionary, otherwise 
> we risk wasting unnecessary heap or using insufficient block size. I also 
> think this issue would affect other Lucene users too.
>  
> For reference, the benchmark we setup is as below:
>  * Warm up both FST for 10.000 iterations (10.000 seems to be the JVM 
> inlining trigger threshold)
>  * Run 100 tests, each test run 100.000 iterations
>  * Each iteration is simply call *org.apache.lucene.util.fst.Util.get()* with 
> the same 4-character word.
> It reports that 98 out of 100 tests, the one with save/load logic is 40-50% 
> faster on average.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to