[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082268#comment-16082268 ]
Christoph Hack edited comment on SOLR-10993 at 7/11/17 2:26 PM: ---------------------------------------------------------------- Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code:title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code:title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["<em>foo</em>"], "prop03_txt":["<em>foo</em>"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D2":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["<em>foo</em>"], "prop04_txt":["<em>foo</em>"], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D3":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["<em>foo</em>"], "prop04_txt":[], "prop05_txt":["<em>foo</em>"], "prop06_txt":[], "prop07_txt":[]}, "D4":{ "prop01_txt":[], "prop03_txt":["<em>foo</em>"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":["<em>foo</em>"], "prop07_txt":[]}, "D5":{ "prop01_txt":[], "prop03_txt":["<em>foo</em>"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":["<em>foo</em>"]}}} {code} As you can see, the highlighting response contains far too many entries. In my example, I get about 10k entries per result item which is painfully slow. was (Author: tux21b): Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code|title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code:title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["<em>foo</em>"], "prop03_txt":["<em>foo</em>"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D2":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["<em>foo</em>"], "prop04_txt":["<em>foo</em>"], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D3":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["<em>foo</em>"], "prop04_txt":[], "prop05_txt":["<em>foo</em>"], "prop06_txt":[], "prop07_txt":[]}, "D4":{ "prop01_txt":[], "prop03_txt":["<em>foo</em>"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":["<em>foo</em>"], "prop07_txt":[]}, "D5":{ "prop01_txt":[], "prop03_txt":["<em>foo</em>"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":["<em>foo</em>"]}}} {code} As you can see, the highlighting response contains far too many entries. In my example, I get about 10k entries per result item which is painfully slow. > lots of empty highlight entries > ------------------------------- > > Key: SOLR-10993 > URL: https://issues.apache.org/jira/browse/SOLR-10993 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter > Affects Versions: 6.6 > Reporter: Christoph Hack > > I have indexed documents with lots of different text fields representing > different properties in Solr (version 6.6). Those text fields are indexed > with storeOffsetsWithPositions=true and termVectors=true to speed up > highlighting using the UnifiedHighlighter. > During a search, i would like to highlight those properties and I have set > hl.fl to wildcard match all properties. Everything is working fine, except > that the responses are huge. > Every document only has a small set of properties (let's say 10 in total, > with 1-2 matching ones), but Solr returns in the highlighting section, a > dictionary with every possible property (about 10k) for every item. Nearly > all of the entries are empty, but decoding the keys of the map takes a > considerable amount of time. > In fact, the time spent decoding this unnecessary entries is enormous. Solr > takes about 174ms for the search + encoding (i expect that the timing could > be much better) and decoding the response in Go (using the default JSON > package from the standard library) takes 695ms. > I guess the offending line is somewhere around: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 > Why is Solr generating map entries for missing values in the first place? > The question had been posted on stackoverflow before: > https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org