[ 
https://issues.apache.org/jira/browse/SOLR-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-12688:
---------------------------------------
    Component/s: contrib - LTR

> LTR Multiple performance fixes + pure DocValues support for FieldValueFeature
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-12688
>                 URL: https://issues.apache.org/jira/browse/SOLR-12688
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LTR
>            Reporter: Stanislav Livotov
>            Priority: Major
>         Attachments: LTRModelHashCodeAfter.png, LTRModelHashCodeBefore.png, 
> LTRSolrFeatureAfter.png, LTRSolrFeatureBefore.png, LTRwithDVOptimisation.png, 
> LTRwithoutDVOptimisation.png, MultiplePerformanceFixes.patch
>
>
> This ticket is related to 2 performance and 1 functional/performance issue 
> that I had found during integrating LTR in our e-commerce search engine : 
>  # FieldValueFeature doesn't support pure DocValues fields (Stored false). 
> Please also note that for fields which are both stored and DocValues it is 
> working not optimal because it is extracting just one field from the stored 
> document. DocValues are obviously faster for such usecases. Below are 
> screenshots of JFR profiles without and with new support of DocValues for the 
> case when it can be read from DocValues. 
>  !LTRwithoutDVOptimisation.png! 
>  !LTRwithDVOptimisation.png!
>  # SolrFeature was not optimally implemented for the case when no fq 
> parameter was passed. I'm not absolutely sure what was the intention to 
> introduce fq parameter for SolrFeature at all, so I decided not to change 
> behavior but just optimize described case !LTRSolrFeatureBefore.png! 
> !LTRSolrFeatureAfter.png!
>  # LTRScoringModel was a mutable object. It was leading to the calculation of 
> hashcode on each query, which in turn can consume a lot of time in cases when 
> a model is big(In our case we were using LambdaMART with 100 trees and leaves 
> which was consuming 3MB of the disk size). So I decided to make 
> LTRScoringModel immutable and cache hashCode calculation. Below are the 
> screenshots before and after.  
> !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png!
> In our case, we had a feature.json file with 8 FieldValueFeatures, 5 
> SolrFeatures and 1 OriginalScoreFeature. 
> Before introducing the optimizations performance overhead for LTR reranking 
> of top 48 documents was 300ms. With all the optimizations in it was decreased 
> to 35ms. 
> Please also note that JFR screenshots were captured on Solr 6.6 codebase. All 
> the numbers are also taken from Solr version 6.6. 
> I hope that changes of the DocValues interface(method get() was removed and 
> advanceExact was added) won't affect it (At least for DenseNumericDocValues 
> it will work as expected.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to