[
https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4748:
---------------------------------------
Attachment: LUCENE-4748.patch
Thank you for all the feedback Shai! Responses:
bq. I think that DrillSideways can take a DrillDownQuery (once we finish with
LUCENE-4750)?
+1
DrillSideways will still need to do its own thing (build the BQ with
minShouldMatch), but it should just take a DDQ.
bq. In .search(), just set minShouldMatch to 1 if (drillDownQueries.size() ==
1)? It reads simpler...
Fixed.
bq. Also, why do you need to add a fake Query? I understand the rewrite will
eliminate BQ and return TQ, but what's the harm?
bq. Isn't minShouldMatch=1 in that case similar to TQ?
If we don't add the fake query then BQ of a single term will just
rewrite itself to that single term, so we won't find our scorer.
Really we should make a specialized collector for this case (I added a
TODO) because you should just use query foobar OR drill-down, and then
check only the drill-down scorer to see if it matched.
But I think we can do that later (it's an opto).
bq. In getDimIndex:
bq. Extract dims.size() to a variable so it's not executed in every loop?
Fixed.
bq. I think you can drop the if (cp.length > 0)? It doesn't make sense for
someone to pass an empty CP. Also, you can assert on that in .addDrillDown()
bq. BTW, I noticed that you test that in DrillSidewaysCollector ctor too.
OK I'll move the check into .addDrillDown, but I think it should be a
real check (not assert).
bq. I wonder if we made 'dims' LinkedHashSet it would perform better than these
contains() (in .addDrillDown), get. Then you could just do
dims.get(fr.cp.components[0]). I didn't try that in code, so not sure if you
can get its index...
Good idea, I think LinkedHashMap should work. I'll try that...
{quote}
Also, I think we could simplify things if DrillSideways worked like this:
Either exposed a .getQuery() method, or was itself a Query (like DDQ).
Either exposed a .getCollector() method (returning DrillSidewaysCollector) or
if it was a Query, you'd just initialize a DrillSidewaysCollector (not a big
deal, user-wise).
The collector's getFacetResults() would do the "merging" work that I see in
.search()
Then you:
Won't need DrillSidewaysResult, which today wrap a List<FacetResult> and
TopDocs. Someone could MultiCollector.wrap(topDocsCollector,
sidewaysCollector)? Just like w/ facets?
Won't need the multitude of search() methods. Again, someone could wrap
TopDocsCollector, CachingCollector, TopFieldsCollector...
{quote}
Alas, it's not so simple: you can't use
MultiCollector.wrap(topDocsCollector, sidewaysCollector) because the
BooleanQuery (DrillSidewaysQuery if we do that) hits too many results
(hits + near-misses), and this is why it needs its own collector.
All other collectors you may want for your search (grouping, joining,
etc) need to be MultiCollector.wrap'd and then passed to
DrillSidewaysCollector.
So I'm not sure what value it is to provide the Query/Collector
externally: all you can really do with them is search on the query,
collecting with the collector.
I'm also torn about "opening up" the Query/Collector because we may
change the impl on how the near-miss counts are collected. EG Solr
makes bit sets and does multiple passes to aggregate the near-misses.
bq. In DrillSidewaysCollector ctor:
bq. if (drillSidewaysRequest == null) – that means the user asked to drill-down
on some CPs for dim X, but not requested to count it, right?
Right.
bq. Do we must throw an exception? Perhaps we can just drop the relevant Query
clause? Although, it's not very expected that a user would do that ... so
perhaps keep the code for simplicity.
Or, we could just return null as the FacetResult for that dimension?
The thing is, I suspect it's unusual that the app would want this, and
then in those cases that they do, they could drill-down on their main
query (ie, pass DrillDownQuery as the Query to DrillSideways) and then
we won't count the sideways counts for it.
bq. Instead of doing Collections.singletonList you can just pass the single
FacetRequest to the vararg ctor.
Ahh, good!
bq. If you feel like it, we can optimize FacetSearchParams' vararg ctors to
initialize a singletonList if facetRequests.length == 1.
I don't think we need to ... looks like we use Arrays.asList.
bq. exactCount = Math.max(2, dims.size()); – maybe add a comment why '2'?
Done.
{quote}
In DrillSidewaysCollector.setScorer:
Why does Scorer.getChildren() return a Collection and not List? We used to have
that in IR.listCommits while in practice it was always a List. Can we fix
Scorer?
I looked at all Scorer.getChildren() impls and they either return a List
(ArrayList in most cases) or Collections.singleton (which is a Set). So it's
indeed dangerous to assume it's a List, but I think we should just fix Scorer?
{quote}
I don't think we can expect Scorer.getChildren to return a List,
matching the order of the sub-clauses, in general. Ie, it will depend
on each query: sometimes a Scorer can be dropped, if that clause's
scorer was null, or the Scorers can be reordered (eg,
ConjunctionScorer reorders by expected freq i think).
bq. What do you mean by "// nocommit fragile: need tracker somehow..."? What's
tracker?
Well, we cannot just cast to List and assume number of sub-scorers is
the same as number of clauses. EG if the user's main query returns a
null Scorer because it matches no hits, then BQ will only return a
length 1 list to us.
So to properly handle this I think we need a wrapper Query class (this
original idea came from selckin I think), whose purpose is to track
which Scorer instance was created for our "near-miss" BQ.
{quote}
In DrillSidewaysCollector.collect:
Can you add some documentation to the 'if-else'?
{quote}
OK done.
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
> Key: LUCENE-4748
> URL: https://issues.apache.org/jira/browse/LUCENE-4748
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch,
> LUCENE-4748.patch, LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in. Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]