[
https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573318#comment-13573318
]
Shai Erera commented on LUCENE-4748:
------------------------------------
Few comments:
* I think that DrillSideways can take a DrillDownQuery (once we finish with
LUCENE-4750)?
** It will eliminate .addDrillDown (and it's ok I think that DDQ too will
enforce all passed CPs to belong to the same dimension)
** Though if we do that, how can we set minShouldMatch on sub-query?
** Maybe if DDQ itself won't wrap another Query, but just build a BQ over all
CPs ... then the user will need to wrap, but we can add a utility method.
* In .search(), just set minShouldMatch to 1 if (drillDownQueries.size() == 1)?
It reads simpler...
** Also, why do you need to add a fake Query? I understand the rewrite will
eliminate BQ and return TQ, but what's the harm?
** Isn't minShouldMatch=1 in that case similar to TQ?
* In getDimIndex:
** Extract dims.size() to a variable so it's not executed in every loop?
** I think you can drop the if (cp.length > 0)? It doesn't make sense for
someone to pass an empty CP. Also, you can assert on that in .addDrillDown()
*** BTW, I noticed that you test that in DrillSidewaysCollector ctor too.
** I wonder if we made 'dims' LinkedHashSet it would perform better than these
contains() (in .addDrillDown), get(i). Then you could just do
dims.get(fr.cp.components[0]). I didn't try that in code, so not sure if you
can get its index...
Also, I think we could simplify things if DrillSideways worked like this:
* Either exposed a .getQuery() method, or was itself a Query (like DDQ).
* Either exposed a .getCollector() method (returning DrillSidewaysCollector) or
if it was a Query, you'd just initialize a DrillSidewaysCollector (not a big
deal, user-wise).
* The collector's getFacetResults() would do the "merging" work that I see in
.search()
Then you:
* Won't need DrillSidewaysResult, which today wrap a List<FacetResult> and
TopDocs. Someone could MultiCollector.wrap(topDocsCollector,
sidewaysCollector)? Just like w/ facets?
* Won't need the multitude of search() methods. Again, someone could wrap
TopDocsCollector, CachingCollector, TopFieldsCollector...
In DrillSidewaysCollector ctor:
* if (drillSidewaysRequest == null) -- that means the user asked to drill-down
on some CPs for dim X, but not requested to count it, right?
** Do we must throw an exception? Perhaps we can just drop the relevant Query
clause? Although, it's not very expected that a user would do that ... so
perhaps keep the code for simplicity.
* Instead of doing Collections.singletonList you can just pass the single
FacetRequest to the vararg ctor. If you feel like it, we can optimize
FacetSearchParams' vararg ctors to initialize a singletonList if
facetRequests.length == 1.
* exactCount = Math.max(2, dims.size()); -- maybe add a comment why '2'?
In DrillSidewaysCollector.setScorer:
* Why does Scorer.getChildren() return a Collection and not List? We used to
have that in IR.listCommits while in practice it was always a List. Can we fix
Scorer?
** I looked at all Scorer.getChildren() impls and they either return a List
(ArrayList in most cases) or Collections.singleton (which is a Set). So it's
indeed dangerous to assume it's a List, but I think we should just fix Scorer?
* What do you mean by "// nocommit fragile: need tracker somehow..."? What's
tracker?
In DrillSidewaysCollector.collect:
* Can you add some documentation to the 'if-else'?
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
> Key: LUCENE-4748
> URL: https://issues.apache.org/jira/browse/LUCENE-4748
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch,
> LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in. Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]