[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Michael McCandless (JIRA) Sat, 04 Jun 2011 03:17:43 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044253#comment-13044253
 ]


Michael McCandless commented on SOLR-2564:
------------------------------------------

bq. I haven't been following this... 

Thank you for the review Yonik!

bq. I took a quick look at the patch, and at first blush it's hard to tell what 
changes are cleanup and what changes are cut-over.

It's definitely a big refactoring.

bq. in the QueryComponent, why the change to set the GET_SCORES flag based on 
the sort(s)?

I assume this is because if you sort or groupSort by a Sort that
contains SortField.SCORE, you need a scorer.  Not sure why Solr trunk
gets away with not doing this, though... Martijn do you know?  Oh, and
likely because the CachingCollector needs to know up front if scores
are needed.

bq. I'm not a fan of this new style for matching request parameters to enums... 
solr does a lot of lookups on a typical request, and a switch to this style 
everywhere could definitely have an impact (the whole upper-casing the request 
param so we can match it to the enum name).

The .upper() calls (twice per request, only if the request has
group=true) are negligible here (true, other situations might be
different, though we should fix them to lookup/check once).

And, this approach gives us strong checking of the enum fields.
Contrast that with what's on trunk now:

{noformat}
  String format = params.get(GroupParams.GROUP_FORMAT);
  Grouping.Format defaultFormat = "simple".equals(format) ? 
Grouping.Format.Simple : Grouping.Format.Grouped; 
{noformat}

If you have a typo in "simple", you'll unexpectedly get
Grouping.Format.Grouped.

I think we should strongly check all of our request params?

{quote}
"Accuracy" seems a bit mis-named? It seems to imply an accuracy trade-off, but 
both methods are 100% accurate here, they just do different things to serve 
different usecases. At least the name doesn't seem to have made it's way into 
the external API though.

The parameter "group.totalCount" I would expect to return the total count of 
something, not control the pre/post faceting thing? (or are the comments just 
wrong?) If it's to return the number of groups, then perhaps the name should be 
"group.groupCount" as totalCount is unit-less.
{quote}

I agree.  This setting just controls what the "total hit count" should
count -- unique groups vs docs.

How about TotalCount.GROUPS and TotalCount.DOCS?

bq. What does "group.docSet" do? The comments don't quite make sense to me, but 
the param suggests it's sort of like group.totalCount?

I think we should remove this from this patch (see my comment above --
it applies to LUCENE-3097).

bq. in the interest of reducing the number of parameters, we could dump 
group.cache and have a single group.cacheMB parameter that uses 0 as no cache, 
-1 as maximum needed (solr uses -1 in this manner in other places too), and 
other values as literal number of MB (which I'd discourage people from using 
personally).

+1, that makes sense.

bq. I'm not sure we should default group.cache to true... there's a downside to 
the memory usage, and it's fragile: things may be working just fine, and the 
user may add a few more documents to the index and then the limit is hit and it 
just stops working (but still consumes much memory and extra log warnings per 
request).

Enabling caching can make a huge improvement in QPS, especially for
queries that are costly to execute but don't match too many docs.

Maybe instead of a fixed MB we could make it a percentage of the
maxDoc?  This would make it less fragile..  So you could say you're
willing to cache up to 50% of the total docs in the index.

bq. FYI: there's a nocommit in there misspelled as "No commit"

Martijn can you fix this one...?  And in general try to spell it as
"nocommit" :) This is what our build catches.  (And, fear not, you're
not the only person to have trouble spelling nocommit!!).


> Integrating grouping module into Solr 4.0
> -----------------------------------------
>
>                 Key: SOLR-2564
>                 URL: https://issues.apache.org/jira/browse/SOLR-2564
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this 
> up in Solr.
> Besides the current grouping features Solr provides, Solr will then also 
> support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

Reply via email to