[ 
https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-8781:
---------------------------------
    Description: 
This issue is for exploring an alternate FST encoding of Arcs as full-sized 
arrays so Arcs are addressed directly by label, avoiding binary search that we 
use today for arrays of Arcs.

h3. Testing

ant test passes. I added some unit tests that were helpful in uncovering bugs 
while
implementing which are more difficult to chase down when uncovered by the 
randomized testing we already do. They don't really test anything new; they're 
just more focused.

I'm not sure why, but ant precommit failed for me with:

{noformat}
 ...lucene-solr/solr/common-build.xml:536: Check for forbidden API calls failed 
while scanning class 
'org.apache.solr.metrics.reporters.SolrGangliaReporterTest' 
(SolrGangliaReporterTest.java): java.lang.ClassNotFoundException: 
info.ganglia.gmetric4j.gmetric.GMetric (while looking up details about 
referenced class 'info.ganglia.gmetric4j.gmetric.GMetric')
{noformat}

I also got Test2BFST running (it was originally timing out due to excessive 
calls to ramBytesUsage(), which seems to have gotten slow), and it passed; that 
change isn't include here.

h4. Micro-benchmark

I timed lookups in FST via FSTEnum.seekExact in a unit test under various 
conditions. 

h5. English words

A test of looking up existing words in a dictionary of ~170000 English words 
shows improvements; the numbers listed are % change in FST size, time to look 
up (FSTEnum.seekExact) words that are in the dict, and time to look up random 
strings that are not in the dict. The comparison is against the current 
codebase with the optimization disabled. A separate comparison of showed no 
significant change of the baseline (no opto applied) vs the current master FST 
impl with no code changes applied.

||  load=2    ||   load=4     ||  load=16 ||
| +4, -6, -7  | +18, -11, -8 | +22, -11.5, -7 |

The "load factor" used for those measurements controls when direct array arc 
encoding is used;
namely when the number of outgoing arcs was > load * (max label - min label).

h5. sequential and random terms

The same test, with terms being a sequence of integers as strings shows a 
larger improvement, around 20% (load=4). This is presumably the best case for 
this delta, where every Arc is encoded as a direct lookup.

When random lowercase ASCII strings are used, a smaller improvement of around 
4% is seen.

h4. luceneutil

Testing w/luceneutil (wikimediumall) we see improvements mostly in the PKLookup 
case. Other results seem noisy, with perhaps a small improvment in some of the 
queries.

{noformat}
                    Task    QPS base      StdDev    QPS opto      StdDev        
        Pct diff
              OrHighHigh        6.93      (3.0%)        6.89      (3.1%)   
-0.5% (  -6% -    5%)
               OrHighMed       45.15      (3.9%)       44.92      (3.5%)   
-0.5% (  -7% -    7%)
                Wildcard        8.72      (4.7%)        8.69      (4.6%)   
-0.4% (  -9% -    9%)
              AndHighLow      274.11      (2.6%)      273.58      (3.1%)   
-0.2% (  -5% -    5%)
               OrHighLow      241.41      (1.9%)      241.11      (3.5%)   
-0.1% (  -5% -    5%)
              AndHighMed       52.23      (4.1%)       52.41      (5.3%)    
0.3% (  -8% -   10%)
                 MedTerm     1026.24      (3.1%)     1030.52      (4.3%)    
0.4% (  -6% -    8%)
                HighTerm     1111.10      (3.4%)     1116.70      (4.0%)    
0.5% (  -6% -    8%)
   HighTermDayOfYearSort       14.59      (8.2%)       14.73      (9.3%)    
1.0% ( -15% -   20%)
             AndHighHigh       13.45      (6.2%)       13.61      (4.4%)    
1.2% (  -8% -   12%)
       HighTermMonthSort       63.09     (12.5%)       64.13     (10.9%)    
1.6% ( -19% -   28%)
                 LowTerm     1338.94      (3.3%)     1383.90      (5.5%)    
3.4% (  -5% -   12%)
                PKLookup      120.45      (2.5%)      130.91      (3.5%)    
8.7% (   2% -   15%)
{noformat}

h4.FST perf tests

I ran LookupBenchmarkTest to see the impact on the suggesters which make heavy 
use of FSTs. Some show little or no improvement, but in some cases there are 
substantial gains.

!FST-2-4.png!
!FST-6-9.png!
!FST-size.png!

  chart TK

h3. API change / implementation notes

The only change in the public FST API is that the Builder constructor now takes 
an additional
boolean ("useDirectArcAddressing") controlling whether or not this new 
optimization is
applied. However, FST's internal details are not really hidden, so in practice 
any change to its
encoding can have ripple effects in other classes.

This is because the FST decoding is repeated in several places in the code 
base, sometimes with
subtle variations: eg FST, FSTEnum, and fst.Util have very similar, but not 
shared, traversal code,
and there are a few other places this same or similar decoding algorithm 
appears as well (eg
blocktreeords codec). I do think it would be useful to make this code more DRY, 
and to hide the
traversal implementation within a single class (or at least in the package), 
but it seemed better to
do that separately. I also think there are possibilities for improving this 
encoding further, maybe
by encoding Arcs in contiguous blocks with gaps in between, but before 
attempting anything like
that, I think it would good to do that refactor to make the code in its current 
form easier to
understand and work with.


  was:
This issue is for exploring an alternate FST encoding of Arcs as full-sized 
arrays so Arcs are addressed directly by label, avoiding binary search that we 
use today for arrays of Arcs.

h3. Testing

ant test passes. I added some unit tests that were helpful in uncovering bugs 
while
implementing which are more difficult to chase down when uncovered by the 
randomized testing we already do. They don't really test anything new; they're 
just more focused.

I'm not sure why, but ant precommit failed for me with:

{noformat}
 ...lucene-solr/solr/common-build.xml:536: Check for forbidden API calls failed 
while scanning class 
'org.apache.solr.metrics.reporters.SolrGangliaReporterTest' 
(SolrGangliaReporterTest.java): java.lang.ClassNotFoundException: 
info.ganglia.gmetric4j.gmetric.GMetric (while looking up details about 
referenced class 'info.ganglia.gmetric4j.gmetric.GMetric')
{noformat}

I also got Test2BFST running (it was originally timing out due to excessive 
calls to ramBytesUsage(), which seems to have gotten slow), and it passed; that 
change isn't include here.

h4. Micro-benchmark

I timed lookups in FST via FSTEnum.seekExact in a unit test under various 
conditions. 

h5. English words

A test of looking up existing words in a dictionary of ~170000 English words 
shows improvements; the numbers listed are % change in FST size, time to look 
up (FSTEnum.seekExact) words that are in the dict, and time to look up random 
strings that are not in the dict. The comparison is against the current 
codebase with the optimization disabled. A separate comparison of showed no 
significant change of the baseline (no opto applied) vs the current master FST 
impl with no code changes applied.

   load=2    ||   load=4     ||  load=16
 +4, -6, -7  | +18, -11, -8 | +22, -11.5, -7

The "load factor" used for those measurements controls when direct array arc 
encoding is used;
namely when the number of outgoing arcs was > load * (max label - min label).

h5. sequential and random terms

The same test, with terms being a sequence of integers as strings shows a 
larger improvement, around 20% (load=4). This is presumably the best case for 
this delta, where every Arc is encoded as a direct lookup.

When random lowercase ASCII strings are used, a smaller improvement of around 
4% is seen.

h4. luceneutil

Testing w/luceneutil (wikimediumall) we see improvements mostly in the PKLookup 
case. Other results seem noisy, with perhaps a small improvment in some of the 
queries.

{noformat}
                    Task    QPS base      StdDev    QPS opto      StdDev        
        Pct diff
              OrHighHigh        6.93      (3.0%)        6.89      (3.1%)   
-0.5% (  -6% -    5%)
               OrHighMed       45.15      (3.9%)       44.92      (3.5%)   
-0.5% (  -7% -    7%)
                Wildcard        8.72      (4.7%)        8.69      (4.6%)   
-0.4% (  -9% -    9%)
              AndHighLow      274.11      (2.6%)      273.58      (3.1%)   
-0.2% (  -5% -    5%)
               OrHighLow      241.41      (1.9%)      241.11      (3.5%)   
-0.1% (  -5% -    5%)
              AndHighMed       52.23      (4.1%)       52.41      (5.3%)    
0.3% (  -8% -   10%)
                 MedTerm     1026.24      (3.1%)     1030.52      (4.3%)    
0.4% (  -6% -    8%)
                HighTerm     1111.10      (3.4%)     1116.70      (4.0%)    
0.5% (  -6% -    8%)
   HighTermDayOfYearSort       14.59      (8.2%)       14.73      (9.3%)    
1.0% ( -15% -   20%)
             AndHighHigh       13.45      (6.2%)       13.61      (4.4%)    
1.2% (  -8% -   12%)
       HighTermMonthSort       63.09     (12.5%)       64.13     (10.9%)    
1.6% ( -19% -   28%)
                 LowTerm     1338.94      (3.3%)     1383.90      (5.5%)    
3.4% (  -5% -   12%)
                PKLookup      120.45      (2.5%)      130.91      (3.5%)    
8.7% (   2% -   15%)
{noformat}

h4.FST perf tests

I ran LookupBenchmarkTest to see the impact on the suggesters which make heavy 
use of FSTs. Some show little or no improvement, but in some cases there are 
substantial gains.

  chart TK

h3. API change / implementation notes

The only change in the public FST API is that the Builder constructor now takes 
an additional
boolean ("useDirectArcAddressing") controlling whether or not this new 
optimization is
applied. However, FST's internal details are not really hidden, so in practice 
any change to its
encoding can have ripple effects in other classes.

This is because the FST decoding is repeated in several places in the code 
base, sometimes with
subtle variations: eg FST, FSTEnum, and fst.Util have very similar, but not 
shared, traversal code,
and there are a few other places this same or similar decoding algorithm 
appears as well (eg
blocktreeords codec). I do think it would be useful to make this code more DRY, 
and to hide the
traversal implementation within a single class (or at least in the package), 
but it seemed better to
do that separately. I also think there are possibilities for improving this 
encoding further, maybe
by encoding Arcs in contiguous blocks with gaps in between, but before 
attempting anything like
that, I think it would good to do that refactor to make the code in its current 
form easier to
understand and work with.



> Explore FST direct array arc encoding 
> --------------------------------------
>
>                 Key: LUCENE-8781
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8781
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Mike Sokolov
>            Priority: Major
>         Attachments: FST-2-4.png, FST-6-9.png, FST-size.png
>
>
> This issue is for exploring an alternate FST encoding of Arcs as full-sized 
> arrays so Arcs are addressed directly by label, avoiding binary search that 
> we use today for arrays of Arcs.
> h3. Testing
> ant test passes. I added some unit tests that were helpful in uncovering bugs 
> while
> implementing which are more difficult to chase down when uncovered by the 
> randomized testing we already do. They don't really test anything new; 
> they're just more focused.
> I'm not sure why, but ant precommit failed for me with:
> {noformat}
>  ...lucene-solr/solr/common-build.xml:536: Check for forbidden API calls 
> failed while scanning class 
> 'org.apache.solr.metrics.reporters.SolrGangliaReporterTest' 
> (SolrGangliaReporterTest.java): java.lang.ClassNotFoundException: 
> info.ganglia.gmetric4j.gmetric.GMetric (while looking up details about 
> referenced class 'info.ganglia.gmetric4j.gmetric.GMetric')
> {noformat}
> I also got Test2BFST running (it was originally timing out due to excessive 
> calls to ramBytesUsage(), which seems to have gotten slow), and it passed; 
> that change isn't include here.
> h4. Micro-benchmark
> I timed lookups in FST via FSTEnum.seekExact in a unit test under various 
> conditions. 
> h5. English words
> A test of looking up existing words in a dictionary of ~170000 English words 
> shows improvements; the numbers listed are % change in FST size, time to look 
> up (FSTEnum.seekExact) words that are in the dict, and time to look up random 
> strings that are not in the dict. The comparison is against the current 
> codebase with the optimization disabled. A separate comparison of showed no 
> significant change of the baseline (no opto applied) vs the current master 
> FST impl with no code changes applied.
> ||  load=2    ||   load=4     ||  load=16 ||
> | +4, -6, -7  | +18, -11, -8 | +22, -11.5, -7 |
> The "load factor" used for those measurements controls when direct array arc 
> encoding is used;
> namely when the number of outgoing arcs was > load * (max label - min label).
> h5. sequential and random terms
> The same test, with terms being a sequence of integers as strings shows a 
> larger improvement, around 20% (load=4). This is presumably the best case for 
> this delta, where every Arc is encoded as a direct lookup.
> When random lowercase ASCII strings are used, a smaller improvement of around 
> 4% is seen.
> h4. luceneutil
> Testing w/luceneutil (wikimediumall) we see improvements mostly in the 
> PKLookup case. Other results seem noisy, with perhaps a small improvment in 
> some of the queries.
> {noformat}
>                     Task    QPS base      StdDev    QPS opto      StdDev      
>           Pct diff
>               OrHighHigh        6.93      (3.0%)        6.89      (3.1%)   
> -0.5% (  -6% -    5%)
>                OrHighMed       45.15      (3.9%)       44.92      (3.5%)   
> -0.5% (  -7% -    7%)
>                 Wildcard        8.72      (4.7%)        8.69      (4.6%)   
> -0.4% (  -9% -    9%)
>               AndHighLow      274.11      (2.6%)      273.58      (3.1%)   
> -0.2% (  -5% -    5%)
>                OrHighLow      241.41      (1.9%)      241.11      (3.5%)   
> -0.1% (  -5% -    5%)
>               AndHighMed       52.23      (4.1%)       52.41      (5.3%)    
> 0.3% (  -8% -   10%)
>                  MedTerm     1026.24      (3.1%)     1030.52      (4.3%)    
> 0.4% (  -6% -    8%)
>                 HighTerm     1111.10      (3.4%)     1116.70      (4.0%)    
> 0.5% (  -6% -    8%)
>    HighTermDayOfYearSort       14.59      (8.2%)       14.73      (9.3%)    
> 1.0% ( -15% -   20%)
>              AndHighHigh       13.45      (6.2%)       13.61      (4.4%)    
> 1.2% (  -8% -   12%)
>        HighTermMonthSort       63.09     (12.5%)       64.13     (10.9%)    
> 1.6% ( -19% -   28%)
>                  LowTerm     1338.94      (3.3%)     1383.90      (5.5%)    
> 3.4% (  -5% -   12%)
>                 PKLookup      120.45      (2.5%)      130.91      (3.5%)    
> 8.7% (   2% -   15%)
> {noformat}
> h4.FST perf tests
> I ran LookupBenchmarkTest to see the impact on the suggesters which make 
> heavy use of FSTs. Some show little or no improvement, but in some cases 
> there are substantial gains.
> !FST-2-4.png!
> !FST-6-9.png!
> !FST-size.png!
>   chart TK
> h3. API change / implementation notes
> The only change in the public FST API is that the Builder constructor now 
> takes an additional
> boolean ("useDirectArcAddressing") controlling whether or not this new 
> optimization is
> applied. However, FST's internal details are not really hidden, so in 
> practice any change to its
> encoding can have ripple effects in other classes.
> This is because the FST decoding is repeated in several places in the code 
> base, sometimes with
> subtle variations: eg FST, FSTEnum, and fst.Util have very similar, but not 
> shared, traversal code,
> and there are a few other places this same or similar decoding algorithm 
> appears as well (eg
> blocktreeords codec). I do think it would be useful to make this code more 
> DRY, and to hide the
> traversal implementation within a single class (or at least in the package), 
> but it seemed better to
> do that separately. I also think there are possibilities for improving this 
> encoding further, maybe
> by encoding Arcs in contiguous blocks with gaps in between, but before 
> attempting anything like
> that, I think it would good to do that refactor to make the code in its 
> current form easier to
> understand and work with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to