[
https://issues.apache.org/jira/browse/SOLR-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903755#comment-13903755
]
Rob Tulloh commented on SOLR-5709:
----------------------------------
Thank you for the question. Let me try and explain with an example.
We use field collapsing to return all the documents associated with a single
ID. We parse the results and apply highlighting for end users to see how their
search terms were matched in the returned text. In the case of the test I am
running, there are 10 unique IDs to be found. The fact that 6 documents are
duplicated should not impact the unique number of groups returned. In fact,
that is proven because we count 10 results when we iterate the results. What I
would also expect is that the hit count (ngroups) would reflect this. Here is a
query result to demonstrate the issue. Note that the group field is
group.field=storageid
{noformat}
[root@aggregator-1 solr]# wget -O- -q
'http://localhost:8983/solr/select?params={hl.requireFieldMatch=true&group.ngroups=true&group.limit=1000&isPartial=0&hl.simple.pre=<b>&hl.fl=*&wt=xml&hl=true&rows=1&EmsQueryId=INTERNAL&f.mailsubject2.qf=mailsubject&shards=archive-8.ems.labmanager.net:8983/solr,archive-6.ems.labmanager.net:8983/solr&start=0&q=customerid:352&f.body2.qf=body&group.field=storageid&hl.simple.post=</b>&group=true&qt=/search-any&EmsQueryTs=1392658773339}'
{noformat}
And the output. Note the value of matches and ngroups in the output.
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">39</int><lst name="params"><str
name="group.ngroups">true</str><str name="group.limit">1000</str><str
name="isPartial">0</str><str name="hl.simple.pre"><b></str><str
name="params">{hl.requireFieldMatch=true</str><str name="hl.fl">*</str><str
name="wt">xml</str><str name="hl">true</str><str name="rows">1</str><str
name="EmsQueryId">INTERNAL</str><str
name="f.mailsubject2.qf">mailsubject</str><str
name="shards">archive-8.ems.labmanager.net:8983/solr,archive-6.ems.labmanager.net:8983/solr</str><str
name="start">0</str><str name="q">customerid:352</str><str
name="f.body2.qf">body</str><str name="group.field">storageid</str><str
name="hl.simple.post"></b></str><str name="group">true</str><str
name="qt">/search-any</str><str
name="EmsQueryTs">1392658773339}</str></lst></lst><lst name="grouped"><lst
name="storageid"><int name="matches">16</int><int name="ngroups">16</int><arr
name="groups"><lst><long name="groupValue">43937</long><result name="doclist"
numFound="1" start="0" maxScore="7.0955024"><doc><str
name="contentid">43937</str><int name="senderid">12759</int><arr
name="recipientids"><int>12741</int></arr><long
name="storageid">43937</long><date
name="receiveddate">2000-12-12T11:07:00Z</date><str
name="mailfrom">[email protected]</str><str
name="envsender">[email protected]</str><str
name="mailto">[email protected] </str><int name="partitionid">1</int><str
name="indexlevel">0</str><str name="mailcc">[email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
</str><int name="importance">1</int><date
name="emaildate">2000-12-12T11:07:00Z</date><int
name="customerid">352</int><int name="igen1">0</int><int
name="totalsize">3780</int><bool name="isattachment">false</bool><str
name="mime">text/plain</str><int name="clusterlocationid">102</int><int
name="islandid">101</int><int name="size">2240</int><str
name="language">en</str><str name="mailsubject_en">Re: Gallup
Expansion</str><str name="mailsubject2_en">Re: Gallup Expansion</str><long
name="_version_">1460308152307154944</long><date
name="processingtime">2014-02-17T17:32:58.887Z</date></doc></result></lst></arr></lst></lst><lst
name="highlighting"><lst name="43937"/></lst>
</response>
{noformat}
There are exactly 10 unique results associated with that field. I can
understand matches being 16 (the number of documents matching the query), but I
would expect ngroups to be 10 for the number of unique groups being returned.
Our code reads ngroups and returns this as the hit count for the query so that
we report to the caller the number of unique hits observed.
I hope I have made it clear. Please let me know if I can answer any more
questions.
> Highlighting grouped duplicate docs from different shards with group.limit >
> 1 throws ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-5709
> URL: https://issues.apache.org/jira/browse/SOLR-5709
> Project: Solr
> Issue Type: Bug
> Components: highlighter
> Affects Versions: 4.3, 4.4, 4.5, 4.6, 5.0
> Reporter: Steve Rowe
> Assignee: Steve Rowe
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5709.patch
>
>
> In a sharded (non-SolrCloud) deployment, if you index a document with the
> same unique key value into more than one shard, and then try to highlight
> grouped docs with more than one doc per group, where the grouped docs contain
> at least one duplicate doc pair, you get an AIOOBE.
> Here's the stack trace I got from such a situation, with 1 doc indexed into
> each shard in a 2-shard index, with {{group.limit=2}}:
> {noformat}
> ERROR null:java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:185)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:758)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:412)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:202)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]