[jira] [Comment Edited] (SOLR-10993) lots of empty highlight entries

Christoph Hack (JIRA) Tue, 11 Jul 2017 07:27:54 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082268#comment-16082268
 ]


Christoph Hack edited comment on SOLR-10993 at 7/11/17 2:26 PM:
----------------------------------------------------------------

Thanks for your reply, but I am not asking a question... I have already looked 
at the source and have confirmed that it is a bug, as I have written before.

Here is a simple example to reconstruct the behavior:

1. Create a new core "bin/solr create -c bug"

2. Index some documents:

{code:title=Example Data}
{"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"}
{"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"}
{"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"}
{"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"}
{"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"}
{code}


3. Query the database with the unified highlighter:

{code:title=Query}
http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json
{code}

{code:title=Response}
{
  "responseHeader":{
    "status":0,
    "QTime":20,
    "params":{
      "q":"foo",
      "hl":"on",
      "indent":"on",
      "hl.fl":"prop*_txt",
      "hl.method":"unified",
      "wt":"json"}},
  "response":{"numFound":5,"start":0,"docs":[
      {
        "id":"D1",
        "prop01_txt":["foo"],
        "prop03_txt":["foo"],
        "_version_":1572635524573691904},
      {
        "id":"D2",
        "prop02_txt":["foo"],
        "prop04_txt":["foo"],
        "_version_":1572635532961251328},
      {
        "id":"D3",
        "prop02_txt":["foo"],
        "prop05_txt":["foo"],
        "_version_":1572635545661603840},
      {
        "id":"D4",
        "prop03_txt":["foo"],
        "prop06_txt":["foo"],
        "_version_":1572635551479103488},
      {
        "id":"D5",
        "prop03_txt":["foo"],
        "prop07_txt":["foo"],
        "_version_":1572635557318623232}]
  },
  "highlighting":{
    "D1":{
      "prop01_txt":["<em>foo</em>"],
      "prop03_txt":["<em>foo</em>"],
      "prop02_txt":[],
      "prop04_txt":[],
      "prop05_txt":[],
      "prop06_txt":[],
      "prop07_txt":[]},
    "D2":{
      "prop01_txt":[],
      "prop03_txt":[],
      "prop02_txt":["<em>foo</em>"],
      "prop04_txt":["<em>foo</em>"],
      "prop05_txt":[],
      "prop06_txt":[],
      "prop07_txt":[]},
    "D3":{
      "prop01_txt":[],
      "prop03_txt":[],
      "prop02_txt":["<em>foo</em>"],
      "prop04_txt":[],
      "prop05_txt":["<em>foo</em>"],
      "prop06_txt":[],
      "prop07_txt":[]},
    "D4":{
      "prop01_txt":[],
      "prop03_txt":["<em>foo</em>"],
      "prop02_txt":[],
      "prop04_txt":[],
      "prop05_txt":[],
      "prop06_txt":["<em>foo</em>"],
      "prop07_txt":[]},
    "D5":{
      "prop01_txt":[],
      "prop03_txt":["<em>foo</em>"],
      "prop02_txt":[],
      "prop04_txt":[],
      "prop05_txt":[],
      "prop06_txt":[],
      "prop07_txt":["<em>foo</em>"]}}}
{code}

As you can see, the highlighting response contains far too many entries. In my 
example, I get about 10k entries per result item which is painfully slow.


was (Author: tux21b):
Thanks for your reply, but I am not asking a question... I have already looked 
at the source and have confirmed that it is a bug, as I have written before.

Here is a simple example to reconstruct the behavior:

1. Create a new core "bin/solr create -c bug"

2. Index some documents:

{code|title=Example Data}
{"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"}
{"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"}
{"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"}
{"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"}
{"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"}
{code}


3. Query the database with the unified highlighter:

{code:title=Query}
http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json
{code}

{code:title=Response}
{
  "responseHeader":{
    "status":0,
    "QTime":20,
    "params":{
      "q":"foo",
      "hl":"on",
      "indent":"on",
      "hl.fl":"prop*_txt",
      "hl.method":"unified",
      "wt":"json"}},
  "response":{"numFound":5,"start":0,"docs":[
      {
        "id":"D1",
        "prop01_txt":["foo"],
        "prop03_txt":["foo"],
        "_version_":1572635524573691904},
      {
        "id":"D2",
        "prop02_txt":["foo"],
        "prop04_txt":["foo"],
        "_version_":1572635532961251328},
      {
        "id":"D3",
        "prop02_txt":["foo"],
        "prop05_txt":["foo"],
        "_version_":1572635545661603840},
      {
        "id":"D4",
        "prop03_txt":["foo"],
        "prop06_txt":["foo"],
        "_version_":1572635551479103488},
      {
        "id":"D5",
        "prop03_txt":["foo"],
        "prop07_txt":["foo"],
        "_version_":1572635557318623232}]
  },
  "highlighting":{
    "D1":{
      "prop01_txt":["<em>foo</em>"],
      "prop03_txt":["<em>foo</em>"],
      "prop02_txt":[],
      "prop04_txt":[],
      "prop05_txt":[],
      "prop06_txt":[],
      "prop07_txt":[]},
    "D2":{
      "prop01_txt":[],
      "prop03_txt":[],
      "prop02_txt":["<em>foo</em>"],
      "prop04_txt":["<em>foo</em>"],
      "prop05_txt":[],
      "prop06_txt":[],
      "prop07_txt":[]},
    "D3":{
      "prop01_txt":[],
      "prop03_txt":[],
      "prop02_txt":["<em>foo</em>"],
      "prop04_txt":[],
      "prop05_txt":["<em>foo</em>"],
      "prop06_txt":[],
      "prop07_txt":[]},
    "D4":{
      "prop01_txt":[],
      "prop03_txt":["<em>foo</em>"],
      "prop02_txt":[],
      "prop04_txt":[],
      "prop05_txt":[],
      "prop06_txt":["<em>foo</em>"],
      "prop07_txt":[]},
    "D5":{
      "prop01_txt":[],
      "prop03_txt":["<em>foo</em>"],
      "prop02_txt":[],
      "prop04_txt":[],
      "prop05_txt":[],
      "prop06_txt":[],
      "prop07_txt":["<em>foo</em>"]}}}
{code}

As you can see, the highlighting response contains far too many entries. In my 
example, I get about 10k entries per result item which is painfully slow.

> lots of empty highlight entries
> -------------------------------
>
>                 Key: SOLR-10993
>                 URL: https://issues.apache.org/jira/browse/SOLR-10993
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 6.6
>            Reporter: Christoph Hack
>
> I have indexed documents with lots of different text fields representing 
> different properties in Solr (version 6.6). Those text fields are indexed 
> with storeOffsetsWithPositions=true and termVectors=true to speed up 
> highlighting using the UnifiedHighlighter.
> During a search, i would like to highlight those properties and I have set 
> hl.fl to wildcard match all properties. Everything is working fine, except 
> that the responses are huge.
> Every document only has a small set of properties (let's say 10 in total, 
> with 1-2 matching ones), but Solr returns in the highlighting section, a 
> dictionary with every possible property (about 10k) for every item. Nearly 
> all of the entries are empty, but decoding the keys of the map takes a 
> considerable amount of time.
> In fact, the time spent decoding this unnecessary entries is enormous. Solr 
> takes about 174ms for the search + encoding (i expect that the timing could 
> be much better) and decoding the response in Go (using the default JSON 
> package from the standard library) takes 695ms.
> I guess the offending line is somewhere around:
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175
> Why is Solr generating map entries for missing values in the first place?
> The question had been posted on stackoverflow before:
> https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-10993) lots of empty highlight entries

Reply via email to