Re: Index-time boosting: Deprecated setBoost method

Uwe Schindler Fri, 18 Oct 2019 12:19:40 -0700

Hi,

Read my original email! The index time values are written using 
NumericDocValuesField. The expressions docs also refer to that when the 
bindings are documented.


It's separate from the indexed data (TextField). Think of it like an additional 
numeric field in your database table with a factor in each row.

Uwe

Am October 18, 2019 7:14:03 PM UTC schrieb [email protected]:
>Uwe,-
>
>Two questions there:
>
>i guess this is applicable to TextField, too.
>
>And i was expecting a index writer object in the example for index time
>
>boosting.
>
>Best regards
>
>
>On 10/18/19 2:57 PM, Uwe Schindler wrote:
>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>document in index (this is why I called it index time). During query
>time the expression use the index time values to fold them into the
>query boost at query time.
>>
>> What's your problem with that approach?
>>
>> Uwe
>>
>> Am October 18, 2019 6:50:40 PM UTC schrieb [email protected]:
>>> Uwe,-
>>>
>>>   Thanks, if possible i am looking for a pure Java methodology to do
>the
>>>
>>> index time boosting.
>>>
>>> This example looks like a search time boosting example:
>>>
>>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>
>>>
>>>
>>> Best regards
>>>
>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>> Hi,
>>>>
>>>>> Is there a working example for this? Is this mentioned in the
>Lucene
>>>>> Javadocs or any other docs so that i can look it?
>>>> To index the docvalues, see NumericDocValuesField (it can be added
>to
>>> documents like indexed or stored fields). You may have used them for
>>> sorting already.
>>>>> this methodology seems sort of like discouraging using index time
>>> boosting.
>>>> Not really. Many use this all the time. It's one of the killer
>>> features of both Solr and Elasticsearch. The problem was how the
>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>> Previous setBoost method call was fine and easy to use.
>>>>> Did it have some performance issues and then is that why it was
>>> deprecated?
>>>> No the reason for deprecating this was for several reasons:
>setBoost
>>> was not doing what the user had expected. Internally the boost value
>>> was just multiplied into the document norm factor (which is
>internally
>>> also a docvalues field). The norm factors are only very inprecise
>>> floats stored in a byte, so precision is not well. If you put some
>>> values into it and the length norm was already consuming all bits,
>the
>>> boosting was very coarse. It was also only multiplied into and most
>>> users want to do some stuff like record click counts in the index
>and
>>> then boost for example with the logarithm or some other function. If
>>> the boost is just multiplied into the length norm you have no
>>> flexibility at all.
>>>> In addition you can have several docvalues fields and use their
>>> values in a function (e.g. one field with click count and another
>one
>>> with product price). After that you can combine click count and
>price
>>> (which can be modified indipenently during index updates) and change
>>> boost to boost lower price and higher click count up.
>>>> This is what you can do with the expressions module. You just give
>it
>>> a function.
>>>> Here is an example, the second example is using a
>FunctionScoreQuery
>>> that modifies the score based on the function and the given
>docvalues:
>>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>>> nice
>>>>> where
>>>>>
>>>>> MultiFieldQuery already has boosts field to do this in its
>>> constructor.
>>>> The boots in the query parser are applied for fields during query
>>> time (to have a different weight per field). Index time boosting is
>per
>>> document. So you can combine both.
>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>> title versus body). The parsed query is then wrapped with an
>expression
>>> that modifies the score per document according to the docvalues.
>>>> Uwe
>>>>
>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> that's not true. You can do index time boosting, but you need to
>do
>>> that
>>>>> using a separate field. You just index a numeric docvalues field
>>> (which may
>>>>> contain a long or float value per document). Later you wrap your
>>> query with
>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>> syntax in
>>>>> the expressions module). This allows you to compile a javascript
>>> function
>>>>> that calculated the final score based on the score returned by the
>>> inner query
>>>>> and combines them with docvalues that were indexed per document.
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>> eMail: [email protected]
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: [email protected] <[email protected]>
>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>> To: [email protected]
>>>>>>> Cc: [email protected]
>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>
>>>>>>> It looks like index-time boosting (field) is not possible since
>>> Lucene
>>>>>>> version 7.7.2 and
>>>>>>>
>>>>>>> i was using before for another case the BoostQuery at search
>time
>>> for
>>>>>>> boosting and
>>>>>>>
>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 10/18/19 10:01 AM, [email protected] wrote:
>>>>>>>> Hi,-
>>>>>>>>
>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>> following
>>>>>>>> note in the docs:
>>>>>>>>
>>>>>>>> setBoost(float boost)
>>>>>>>> Deprecated.
>>>>>>>> Index-time boosts are deprecated, please index index-time
>scoring
>>>>>>>> factors into a doc value field and combine them with the score
>at
>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>
>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>> docs
>>>>>>>> would give a simple example to further help.
>>>>>>>>
>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>> Field.html
>>>>>>>> vs
>>>>>>>>
>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>> ield.html
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>
>---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail:
>[email protected]
>>>
>---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>
>---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>
>---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: Index-time boosting: Deprecated setBoost method

Reply via email to