[jira] [Updated] (SOLR-12688) LTR Multiple performance fixes + pure DocValues support for FieldValueFeature

Stanislav Livotov (JIRA) Tue, 21 Aug 2018 14:36:43 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stanislav Livotov updated SOLR-12688:
-------------------------------------
    Description: 
This ticket is related to 2 performance and 1 functional/performance issue that 
I had found during integrating LTR in our e-commerce search engine : 
 # FieldValueFeature doesn't support pure DocValues fields (Stored false). 
Please also note that for fields which are both stored and DocValues it is 
working not optimal because it is extracting just one field from the stored 
document. DocValues are obviously faster for such usecases. Below are 
screenshots of JFR profiles without and with new support of DocValues for the 
case when it can be read from DocValues. 
 !LTRwithoutDVOptimisation.png! 
 !LTRwithDVOptimisation.png!
 # SolrFeature was not optimally implemented for the case when no fq parameter 
was passed. I'm not absolutely sure what was the intention to introduce both 
q(which is supposed to function query) and fq parameter for the same 
SolrFeature at all(Is there a case when they will be used together ? ), so I 
decided not to change behavior but just optimize described case 
!LTRSolrFeatureBefore.png! !LTRSolrFeatureAfter.png!
 # LTRScoringModel was a mutable object. It was leading to the calculation of 
hashcode on each query, which in turn can consume a lot of time in cases when a 
model is big(In our case we were using LambdaMART with 100 trees and leaves 
which was consuming 3MB of the disk size). So I decided to make LTRScoringModel 
immutable and cache hashCode calculation. Below are the screenshots before and 
after.  !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png!

In our case, we had a feature.json file with 8 FieldValueFeatures, 5 
SolrFeatures and 1 OriginalScoreFeature. 
Before introducing the optimizations performance overhead for LTR reranking of 
top 48 documents was 300ms. With all the optimizations in it was decreased to 
35ms. 

Please also note that JFR screenshots were captured on Solr 6.6 codebase. All 
the numbers are also taken from Solr version 6.6. 
I hope that changes of the DocValues interface(method get() was removed and 
advanceExact was added) won't affect it (At least for DenseNumericDocValues it 
will work as expected.)

  was:
This ticket is related to 2 performance and 1 functional/performance issue that 
I had found during integrating LTR in our e-commerce search engine : 
 # FieldValueFeature doesn't support pure DocValues fields (Stored false). 
Please also note that for fields which are both stored and DocValues it is 
working not optimal because it is extracting just one field from the stored 
document. DocValues are obviously faster for such usecases. Below are 
screenshots of JFR profiles without and with new support of DocValues for the 
case when it can be read from DocValues. 
 !LTRwithoutDVOptimisation.png! 
 !LTRwithDVOptimisation.png!
 # SolrFeature was not optimally implemented for the case when no fq parameter 
was passed. I'm not absolutely sure what was the intention to introduce fq 
parameter for SolrFeature at all, so I decided not to change behavior but just 
optimize described case !LTRSolrFeatureBefore.png! !LTRSolrFeatureAfter.png!
 # LTRScoringModel was a mutable object. It was leading to the calculation of 
hashcode on each query, which in turn can consume a lot of time in cases when a 
model is big(In our case we were using LambdaMART with 100 trees and leaves 
which was consuming 3MB of the disk size). So I decided to make LTRScoringModel 
immutable and cache hashCode calculation. Below are the screenshots before and 
after.  !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png!

In our case, we had a feature.json file with 8 FieldValueFeatures, 5 
SolrFeatures and 1 OriginalScoreFeature. 
Before introducing the optimizations performance overhead for LTR reranking of 
top 48 documents was 300ms. With all the optimizations in it was decreased to 
35ms. 

Please also note that JFR screenshots were captured on Solr 6.6 codebase. All 
the numbers are also taken from Solr version 6.6. 
I hope that changes of the DocValues interface(method get() was removed and 
advanceExact was added) won't affect it (At least for DenseNumericDocValues it 
will work as expected.)


> LTR Multiple performance fixes + pure DocValues support for FieldValueFeature
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-12688
>                 URL: https://issues.apache.org/jira/browse/SOLR-12688
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LTR
>            Reporter: Stanislav Livotov
>            Priority: Major
>         Attachments: LTRModelHashCodeAfter.png, LTRModelHashCodeBefore.png, 
> LTRSolrFeatureAfter.png, LTRSolrFeatureBefore.png, LTRwithDVOptimisation.png, 
> LTRwithoutDVOptimisation.png, MultiplePerformanceFixes.patch
>
>
> This ticket is related to 2 performance and 1 functional/performance issue 
> that I had found during integrating LTR in our e-commerce search engine : 
>  # FieldValueFeature doesn't support pure DocValues fields (Stored false). 
> Please also note that for fields which are both stored and DocValues it is 
> working not optimal because it is extracting just one field from the stored 
> document. DocValues are obviously faster for such usecases. Below are 
> screenshots of JFR profiles without and with new support of DocValues for the 
> case when it can be read from DocValues. 
>  !LTRwithoutDVOptimisation.png! 
>  !LTRwithDVOptimisation.png!
>  # SolrFeature was not optimally implemented for the case when no fq 
> parameter was passed. I'm not absolutely sure what was the intention to 
> introduce both q(which is supposed to function query) and fq parameter for 
> the same SolrFeature at all(Is there a case when they will be used together ? 
> ), so I decided not to change behavior but just optimize described case 
> !LTRSolrFeatureBefore.png! !LTRSolrFeatureAfter.png!
>  # LTRScoringModel was a mutable object. It was leading to the calculation of 
> hashcode on each query, which in turn can consume a lot of time in cases when 
> a model is big(In our case we were using LambdaMART with 100 trees and leaves 
> which was consuming 3MB of the disk size). So I decided to make 
> LTRScoringModel immutable and cache hashCode calculation. Below are the 
> screenshots before and after.  
> !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png!
> In our case, we had a feature.json file with 8 FieldValueFeatures, 5 
> SolrFeatures and 1 OriginalScoreFeature. 
> Before introducing the optimizations performance overhead for LTR reranking 
> of top 48 documents was 300ms. With all the optimizations in it was decreased 
> to 35ms. 
> Please also note that JFR screenshots were captured on Solr 6.6 codebase. All 
> the numbers are also taken from Solr version 6.6. 
> I hope that changes of the DocValues interface(method get() was removed and 
> advanceExact was added) won't affect it (At least for DenseNumericDocValues 
> it will work as expected.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-12688) LTR Multiple performance fixes + pure DocValues support for FieldValueFeature

Reply via email to