[
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584299#comment-14584299
]
Hoss Man commented on SOLR-4212:
--------------------------------
Ok, I managed to review a little over half of the the current patch (trying to
start with the small changes first and work my way up) -- comments below...
----
General feedback: This patch, on the whole, would be a lot easier to review --
and the final code a lot easier to make sense of -- if there were more
class/method javadocs explaining what the purpose of everything was. I realize
that could be said for a lot of the existing Solr code, but adding/refactoring
classses/methods is the best time to try and improve the situation since that's
when you're thining about it the most.
----
bq. "SimpleFacets.getFacetCounts() was removed because it would have been weird
for SimpleFacets to create and use its own sub-class
RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced
that bit of code with a static method in FacetComponent which should eliminate
the repetition in MLTHandler.
If you don't think it makes sense to keep this method where it is that's fine,
but it is a public method designed for people writting custom request handlers
to use to add faceting to their stuff -- so at the very least you should leave
a deprecated stub in it's place that calls your new method.
It also confuses the hell out of me that in the process of moving this method
you made the method signature more complicated to use (more args to understand,
more exceptions to deal with) and got rid of the javadocs so the new more
complicated call signature is even harder to understand -- and I, frankly,
can't make sense of why any of that is neccessary (even if should now live in a
diff class because of the inheritence)...
* why does the caller have to pass in an empty NamedList to be mutated? why
can't the method just return a NamedList like the old method did?
* why does the caller have to instantiate the date & range processors
themselves?
** in both of the places this method is called, the processors are constructed
just to pass to this method and then thrown away.
** So why can't this static method construct them internally (using the
request, docset, params, and response builder from the SimpleFacets object
that's also passed as an arg) and keep the method signature simple?
* why does this new method catch & wrap SyntaxErrors but not IOExceptions like
the old method?
(If there reasons for these changes then great -- but they aren't clear just
from reading the patch, and there's no javadocs to get guidance from)
----
bq. I removed the RangeFacetAccumulator class and replaced it with a
LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends
FacetBase just like PivotFacetValue and is responsible for aggregation of a
single facet value.
* I don't see a class called "RangeFacet" ... it looks like you talking about
RangeFacetValue?
** FWIW: PivotFacetValue doesn't extend FacetBase because it models a single
(value,count) pair in the context of a PivotFacetField (which may be
hierarchical) ... you may be thinking of "PivotFacet"
** ... in which case, should this class _atually_ be named "RangeFacet" ?
** either way: can we please beef up the javadocs of this class to be crystal
clear about what it's modeling -- at first glance I thought it was just
modeling a facet.range *param* (with the field, start, end, calculator) before
I realized it not only models the field + the indiviual buckets, but also the
values in those buckets (but not the other params)
* Isn't RangeFacetValue.fieldName redundent here? - it's already handled by
FacetBase.facetOn.
* I don't think there is any reason to initialize shardFieldValues in
mergeContributionFromShard before the {{rangeFacet = rangeFromShard}}
short-circut, it's just wasted cycles the first time the method gets called,
correct?
----
Looking at the changes to PivotFacetValue...
* why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList?
** it seems like a really bad idea for these to not be returned in a
deterministic order -- it should be in the same order as the top level
facet.range results (which is hte order of the facet.range params in the
request)
** no idea if this HashMap usage currently causes problems in the shard
merging, but it smells fishy -- and either way the user should be able to count
on getting the range facets back in the same order the asked for them
* i didn't dig into the full ramifications of this, but you've got
PivotFacetValue constructing RangeFacetValue objects and passing the response
key in as the "fieldName" argument to the constructor -- those aren't the same
thing. Either the method call here is wrong, or the constructor arg should be
renamed -- either way any usage of this "fieldName" constructor arg and
internal usage should be sanity checked to ensure it's used consistnetly
everywhere
** are there tests of using a diff response key from the fieldName when hanging
ranges under pivots? (both single node and cloud?)
* why is PivotFacetValue.createFromNamedList conditionally calling
RangeFacetValue.mergeContributionFromShard?
** Unless i'm missing something, that method is only ever creating entirely new
instances of PivotFacetValue, so there shouldn't be any existing data to merge
into ... correct?
* in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of
looping over each shard's response to either construct a new RangeFacetValue or
call RangeFacetValue.mergeContributionFromShard seems like it could/should be
refactored into a static helper method (in the RangeFacetValue class)
** the exact same loop is used in the only other place
mergeContributionFromShard is called (FacetComponent)
** if PivotFacetValue.createFromNamedList really does need to conditional merge
(see previuos comment), then it's loop would also be exactly the same, and it
could use the same static helper method.
----
Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct
response format (ie: no special objects need constructed to model the simple
counts) can't it just return "querycounts" if "receivingMap" is null? (instead
of looping & adding)
----
With the current patch, something is wrong with how localparams are dealt with
when trying to use facet.ranges inside of pivots. These queries are examples
of doing range queries on the same field with diff gap params, and
(independently) doing a pivot...
{noformat}
http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price
http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price
{noformat}
...but as soon as you try to hang those ranges on the pivot you get a local
errors...
{noformat}
http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price
...
Missing required parameter: f.price.facet.range.start (or default:
facet.range.start)
http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price
...
Missing required parameter: f.price.facet.range.gap (or default:
facet.range.gap)
{noformat}
...the fact that this error only pops up when combining pivots + ranges
suggests that the patch didn't _break_ the existing code controlling how the
param parsing is done, it aparently just isn't re-using the exact same parsing
code in both situations? Which also means that within a single request, the
range facet params are still getting parsed (at least) twice -- once
(correctly) for the top level facets, and (at least) once (incorrectly) when
hanging off of the pivot.
bq. ...Finally I settled for a crude approach of enhancing FacetBase with tags
and just parsing the facet queries and ranges and putting them in the request
context. ...
Hmmm....
I haven't seen this piece of code yet, but it sounds like it only prevents the
"tags" from being parsed over and over again -- from what I've seen so far,
RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates
to
RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams)
is still going to cause a new RangeEndpointCalculator to be constructed, and
the exact same start/end/include/etc... Strings to be parsed over and over and
over again, for every pivot value, at every level. These things should really
just be parsed *once* in prepare(), put into the request context, and reused at
every level of every pivot.
To put it another way...
With the patch right now, if you do the following query using the techproducts
example data:
{noformat}
http://localhost:8983/solr/techproducts/select?rows=0&q=*:*&facet=true&facet.range=manufacturedate_dt&facet.range.start=2000-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=%2B1YEAR
{noformat}
... then 41 DateRangeEndpointCalculator objects get instantated even though 1
is plenty; the start value of "2000-01-01T00:00:00Z" and the end value
"2010-01-01T00:00:00Z" each get parsed into Date objects 41 times, even though
the result is always exactly the same. That seems really inefficient ...
doesn't it defeat the purpose of doing that param parsing in prepare()?
> Let facet queries hang off of pivots
> ------------------------------------
>
> Key: SOLR-4212
> URL: https://issues.apache.org/jira/browse/SOLR-4212
> Project: Solr
> Issue Type: Sub-task
> Components: search
> Affects Versions: 4.0
> Reporter: Steve Molloy
> Assignee: Shalin Shekhar Mangar
> Fix For: 5.3, Trunk
>
> Attachments: SOLR-4212-multiple-q.patch, SOLR-4212-multiple-q.patch,
> SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch,
> SOLR-4212.patch, SOLR-4212.patch, SOLR-6353-6686-4212.patch,
> SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch,
> SOLR-6353-6686-4212.patch, patch-4212.txt
>
>
> Facet pivot provide hierarchical support for computing data used to populate
> a treemap or similar visualization. TreeMaps usually offer users extra
> information by applying an overlay color on top of the existing square sizes
> based on hierarchical counts. This second count is based on user choices,
> representing, usually with gradient, the proportion of the square that fits
> the user's choices.
> The proposition is to use local parameters to specify facet query to apply
> for pivot which matches a tag set on facet query. Parameter format would look
> like:
> facet.pivot={!query=r1}category,manufacturer
> facet.query={!tag=r1}somequery
> facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]