[
https://issues.apache.org/jira/browse/SOLR-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shingo Sasaki updated SOLR-7335:
--------------------------------
Description:
Multivalue field has wrong norm when the field value is tokenized, the field or
document is boosted, and the field is not source of copyField.
{noformat}
$ java -jar start.jar &
$ echo '{
"add": {
"doc": {
"id":"no-boosted",
"features": ["a","b","c"],
"dyn_not_copied_txt": ["a","b","c"]
}
},
"add": {
"boost": 10,
"doc": {
"id":"boosted",
"features": ["a","b","c"],
"dyn_not_copied_txt": ["a","b","c"]
}
}}' > test.json
$ curl 'http://localhost:8983/solr/update/json?commit=true' -H
'Content-type:application/json' --data-binary @test.json
{"responseHeader":{"status":0,"QTime":41}}
$ curl 'http://localhost:8983/solr/select' -d
'omitHeader=true&wt=json&indent=on&q=*:*&fl=id,norm(features),norm(dyn_not_copied_txt)'
{
"response":{"numFound":2,"start":0,"docs":[
{
"id":"no-boosted",
"norm(features)":0.5,
"norm(dyn_not_copied_txt)":0.5},
{
"id":"boosted",
"norm(features)":5.0,
"norm(dyn_not_copied_txt)":512.0}]
}}
{noformat}
In the above example, "features" is source of copyField. On the other hand,
"dyn_not_copied_txt" is not so.
"features" and "dyn_not_copied_txt" have the same type attribute
(type="text_general"), the same values ( ["a","b","c"] ) and the same boost.
So, both fields must have the same norms in the document.
But, in boosted document only, the field that is not copied have too larger
norm.
was:
Multivalue field has wrong norm when the field value is tokenized, the field or
document is boosted, and the field is not source of copyField.
{noformat}
$ java -jar start.jar &
$ echo '{
"add": {
"doc": {
"id":"no-boosted",
"features": ["a","b","c"],
"dyn_not_copied_txt": ["a","b","c"]
}
},
"add": {
"boost": 10,
"doc": {
"id":"boosted",
"features": ["a","b","c"],
"dyn_not_copied_txt": ["a","b","c"]
}
}}' > test.json
$ curl 'http://localhost:8983/solr/update/json?commit=true' -H
'Content-type:application/json' --data-binary @test.json
{"responseHeader":{"status":0,"QTime":41}}
$ curl 'http://localhost:8983/solr/select' -d
'omitHeader=true&wt=json&indent=on&q=*:*&fl=id,norm(features),norm(dyn_not_copied_txt)'
{
"response":{"numFound":2,"start":0,"docs":[
{
"id":"no-boosted",
"norm(features)":0.5,
"norm(dyn_not_copied_txt)":0.5},
{
"id":"boosted",
"norm(features)":5.0,
"norm(dyn_not_copied_txt)":512.0}]
}}
{noformat}
In the above example, "features" and "dyn_not_copied_txt" have the same type
attribute (type="text_general"), the same values ( ["a","b","c"] ) and the same
boost. So, both fields must have the same norms in the document.
But, in boosted document only, the field that is not copied have too larger
norm.
> Multivalue field that is boosted on indexing time has wrong norm.
> -----------------------------------------------------------------
>
> Key: SOLR-7335
> URL: https://issues.apache.org/jira/browse/SOLR-7335
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.10, 5.0
> Reporter: Shingo Sasaki
> Priority: Critical
> Attachments: SOLR-7335.patch
>
>
> Multivalue field has wrong norm when the field value is tokenized, the field
> or document is boosted, and the field is not source of copyField.
> {noformat}
> $ java -jar start.jar &
> $ echo '{
> "add": {
> "doc": {
> "id":"no-boosted",
> "features": ["a","b","c"],
> "dyn_not_copied_txt": ["a","b","c"]
> }
> },
> "add": {
> "boost": 10,
> "doc": {
> "id":"boosted",
> "features": ["a","b","c"],
> "dyn_not_copied_txt": ["a","b","c"]
> }
> }}' > test.json
> $ curl 'http://localhost:8983/solr/update/json?commit=true' -H
> 'Content-type:application/json' --data-binary @test.json
> {"responseHeader":{"status":0,"QTime":41}}
> $ curl 'http://localhost:8983/solr/select' -d
> 'omitHeader=true&wt=json&indent=on&q=*:*&fl=id,norm(features),norm(dyn_not_copied_txt)'
> {
> "response":{"numFound":2,"start":0,"docs":[
> {
> "id":"no-boosted",
> "norm(features)":0.5,
> "norm(dyn_not_copied_txt)":0.5},
> {
> "id":"boosted",
> "norm(features)":5.0,
> "norm(dyn_not_copied_txt)":512.0}]
> }}
> {noformat}
> In the above example, "features" is source of copyField. On the other hand,
> "dyn_not_copied_txt" is not so.
> "features" and "dyn_not_copied_txt" have the same type attribute
> (type="text_general"), the same values ( ["a","b","c"] ) and the same boost.
> So, both fields must have the same norms in the document.
> But, in boosted document only, the field that is not copied have too larger
> norm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]