Best regards
On 10/21/19 12:54 PM, Uwe Schindler wrote:
Hi,
As I said, before that is a misuse of index-time boosting. In addition in
previous versions it did not even work correctly, because of query
normalization it was normalized away anyways. And on top, to change it
your have to reindex.
What you intend to do is a typical use case for query time boosting with
BoostQuery. That is explained in almost every book about search, like those
about Solr or Elasticsearch.
Most query parsers also allow to also add boost factors for fields, e.g.
SimpleQueryParser (for humans that need simple syntax without fields).
There you give a list of fields and boost factors.
Uwe
-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-
3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
eMail: u...@thetaphi.de
-----Original Message-----
From: baris.ka...@oracle.com <baris.ka...@oracle.com>
Sent: Monday, October 21, 2019 6:45 PM
To: java-user@lucene.apache.org
Cc: baris.kazar <baris.ka...@oracle.com>
Subject: Re: Index-time boosting: Deprecated setBoost method
Hi,-
Thanks and i appreciate the disccussion.
Let me please ask this way, i think i give too much info at one time:
Currently i have this:
Field f1= new TextField("field1", "string1", Field.Store.YES);
doc.add(f1);
f1.setBoost(2.0f);
Field f2 = new TextField("field2", "string2", Field.Store.YES);
doc.add(f2);
f2.setBoost(1.0f);
But this fails with Lucene 7.7.2.
Probably it is more efficient and more flexible to fix this by using
BoostQuery.
However, what could be the fix with index time boosting? the code in my
previous post was trying to do that.
Best regards
On 10/21/19 12:34 PM, Uwe Schindler wrote:
Hi,
sorry I don't fully understand what you intend to do? If the boost values
per field are static and used with exactly same value for every document,
it's
not needed a index time. You can just boost the field on the query side
(e.g.
using BoostQuery). Boosting every document with the same static values
is
an anti-pattern, that's something better suited for the query side - as you
are
more flexible.
If you need a different boost value per document, you can save that
boost
value in the index per document using a docvalues field (this consumes
extra
space, of course). Then you need the ExpressionQuery on the query side.
But
just because it looks like Javascript, it's not slow. The syntax is compiled to
bytecode and directly included into the query execution as a dynamic java
class, so it's very fast.
In short:
- If you need to have a different boost factor per field name that's
constant
for all documents, apply it at query time with BoostQuery.
- If you have to boost specific documents (e.g., top selling products),
index
a numeric docvalues field per document. On the query side you can use
different query types to modify the score of each result based on the
docvalues field. That can be done with Expression modules (using
compiled
Javascript) or by another query in Lucene that operates on ValueSource
(e.g.,
FunctionQuery). The first one is easier to use for complex formulas.4
Uwe
-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-
3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
eMail: u...@thetaphi.de
-----Original Message-----
From: baris.ka...@oracle.com <baris.ka...@oracle.com>
Sent: Monday, October 21, 2019 5:17 PM
To: java-user@lucene.apache.org
Cc: baris.kazar <baris.ka...@oracle.com>
Subject: Re: Index-time boosting: Deprecated setBoost method
Hi,-
Sorry about the missing parts in previous post. please accept my
apologies for that.
i needed to add a few more questions/corrections/additions to the
previous post:
Main Question was: if boost is a single constant value, do we need the
Javascript part below?
=== Indexing code snippet for Lucene version 6.6.0 and before===
Document doc = new Document();
Field f1= new TextField("field1", "string1", Field.Store.YES);
doc.add(f1);
f1.setBoost(2.0f);
Field f2 = new TextField("field2", "string2", Field.Store.YES);
doc.add(f2);
f2.setBoost(1.0f);
=== end of indexing code snippet for Lucene version 6.6.0 and before
===
This turns into this where _boost1 field is associated with field1 and
_boost2 field is associated with field2 field:
In Indexing code:
=== begining of indexing code snippet ===
Field f1= new TextField("field1", "string1", Field.Store.YES);
Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);
// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)
Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);
// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)
=== end of indexing code snippet ===
Now, in the searching code (i.e., at query time) should i need the
FunctionScoreQuery because in this case
the boost is just a constant value but not a function? However, constant
value can be argued to be a function with the same value all the time,
too.
== begining of query time code snippet ===
Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
// SimpleBindings just maps variables to SortField instances
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField("_boost1", SortField.Type.LONG));
//
These
have to LONG type i think since NumericDocValuesField accepts "long"
type only, am i right? Can this be DOUBLE type?
bindings.add(new SortField("_boost2", SortField.Type.LONG));
//
same
question here
// create a query that matches based on body:contents but
// scores using expr
Query query = new FunctionScoreQuery(
new TermQuery(new Term("field1", "term_to_look_for")),
expr.getDoubleValuesSource(bindings));
searcher.search(query, 10);
=== end of code snippet ===
Best regards
On 10/21/19 11:05 AM, baris.ka...@oracle.com wrote:
Hi,-
i would like to ask the following to make it clearer (for me at least):
Document doc = new Document();
Field f1= new TextField("field1", "string1", Field.Store.YES);
doc.add(f1);
f1.setBoost(2.0f);
Field f2 = new TextField("field2", "string2", Field.Store.YES);
doc.add(f2);
f2.setBoost(1.0f);
This turns into this where _boost1 field is associated with field1 and
_boost2 field is associated with field2 field:
In Indexing code:
Field f1= new TextField("field1", "string1", Field.Store.YES);
Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);
// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)
Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);
// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)
Now, in the searching code (i.e., at query time) should i need the
FunctionScoreQuery because in this case
the boost is just a constant value but not a function? However,
constant value can be argued to be a function with the same value all
the time, too.
Expression expr = JavascriptCompiler.compile(“_boost");
// SimpleBindings just maps variables to SortField instances
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField("_boost1", SortField.Type.SCORE));
// create a query that matches based on body:contents but
// scores using expr
Query query = new FunctionScoreQuery(
new TermQuery(new Term("field1", "term_to_look_for")),
expr.getDoubleValuesSource(bindings));
searcher.search(query, 10);
So, if boost is a single constant value, do we need the Javascript
part above?
Best regards
On 10/18/19 4:07 PM, baris.ka...@oracle.com wrote:
Uwe,-
can this
https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_7-5F7-
5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
bQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
doc example that You also gave be extended with
NumericDocValuesField
part that needs to be done at indexing time boosting, too?
i see now why You meant that this is mixed type of boosting (i.e.,
both indexing time and search time).
I need then include this query mentioned in this example on these
_score field (i would call it _boost field in my case) into my
overall BooleanQuery.
i will now try to combine these together and post here for future
help.
Best regards
On 10/18/19 3:18 PM, Uwe Schindler wrote:
Hi,
Read my original email! The index time values are written using
NumericDocValuesField. The expressions docs also refer to that
when
the bindings are documented.
It's separate from the indexed data (TextField). Think of it like an
additional numeric field in your database table with a factor in
each row.
Uwe
Am October 18, 2019 7:14:03 PM UTC schrieb
baris.ka...@oracle.com:
Uwe,-
Two questions there:
i guess this is applicable to TextField, too.
And i was expecting a index writer object in the example for index
time
boosting.
Best regards
On 10/18/19 2:57 PM, Uwe Schindler wrote:
Sorry I was imprecise. It's a mix of both. The factors are stored
per
document in index (this is why I called it index time). During query
time the expression use the index time values to fold them into the
query boost at query time.
What's your problem with that approach?
Uwe
Am October 18, 2019 6:50:40 PM UTC schrieb
baris.ka...@oracle.com:
Uwe,-
Thanks, if possible i am looking for a pure Java methodology
to do
the
index time boosting.
This example looks like a search time boosting example:
https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_7-5F7-
5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
bQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
Best regards
On 10/18/19 2:31 PM, Uwe Schindler wrote:
Hi,
Is there a working example for this? Is this mentioned in the
Lucene
Javadocs or any other docs so that i can look it?
To index the docvalues, see NumericDocValuesField (it can be
added
to
documents like indexed or stored fields). You may have used
them
for
sorting already.
this methodology seems sort of like discouraging using index
time
boosting.
Not really. Many use this all the time. It's one of the killer
features of both Solr and Elasticsearch. The problem was how
the
Document.setBoost()worked (it did not work correctly, see
below).
Previous setBoost method call was fine and easy to use.
Did it have some performance issues and then is that why it
was
deprecated?
No the reason for deprecating this was for several reasons:
setBoost
was not doing what the user had expected. Internally the boost
value
was just multiplied into the document norm factor (which is
internally
also a docvalues field). The norm factors are only very inprecise
floats stored in a byte, so precision is not well. If you put some
values into it and the length norm was already consuming all
bits,
the
boosting was very coarse. It was also only multiplied into and
most
users want to do some stuff like record click counts in the index
and
then boost for example with the logarithm or some other
function.
If
the boost is just multiplied into the length norm you have no
flexibility at all.
In addition you can have several docvalues fields and use their
values in a function (e.g. one field with click count and another
one
with product price). After that you can combine click count and
price
(which can be modified indipenently during index updates) and
change
boost to boost lower price and higher click count up.
This is what you can do with the expressions module. You just
give
it
a function.
Here is an example, the second example is using a
FunctionScoreQuery
that modifies the score based on the function and the given
docvalues:
https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_7-5F7-
5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
bQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
FunctionScoreQuery usage with MultiFieldQueryParser would
also
be
nice
where
MultiFieldQuery already has boosts field to do this in its
constructor.
The boots in the query parser are applied for fields during
query
time (to have a different weight per field). Index time boosting is
per
document. So you can combine both.
Maybe it is not needed with MultiFieldQueryParser.
You use MultiFieldQueryParser to adjust weights of the fields
(e.g.
title versus body). The parsed query is then wrapped with an
expression
that modifies the score per document according to the
docvalues.
Uwe
On 10/18/19 1:28 PM, Uwe Schindler wrote:
Hi,
that's not true. You can do index time boosting, but you
need
to
do
that
using a separate field. You just index a numeric docvalues
field
(which may
contain a long or float value per document). Later you wrap
your
query with
some FunctionScoreQuery (e.g., use the Javascript function
query
syntax in
the expressions module). This allows you to compile a
javascript
function
that calculated the final score based on the score returned by
the
inner query
and combines them with docvalues that were indexed per
document.
Uwe
-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-
3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
eMail: u...@thetaphi.de
-----Original Message-----
From: baris.ka...@oracle.com <baris.ka...@oracle.com>
Sent: Friday, October 18, 2019 5:28 PM
To: java-user@lucene.apache.org
Cc: baris.ka...@oracle.com
Subject: Re: Index-time boosting: Deprecated setBoost
method
It looks like index-time boosting (field) is not possible since
Lucene
version 7.7.2 and
i was using before for another case the BoostQuery at
search
time
for
boosting and
this seems to be the only boosting option now in Lucene.
Best regards
On 10/18/19 10:01 AM, baris.ka...@oracle.com wrote:
Hi,-
i saw this in the Field class docs and i am figuring out the
following
note in the docs:
setBoost(float boost)
Deprecated.
Index-time boosts are deprecated, please index index-
time
scoring
factors into a doc value field and combine them with the
score
at
query time using eg. FunctionScoreQuery.
I appreciate this note. Is there an example about this? I
wish
docs
would give a simple example to further help.
https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_6-5F6-
5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
Field.html
vs
https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_7-5F7-
5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
ield.html
Best regards
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-
unsubscr...@lucene.apache.org
For additional commands, e-mail:
java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-
unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-
h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-
unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-
h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-
unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-
h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-
unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-
h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-
3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-
h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-
3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org