[jira] [Commented] (SOLR-4212) Let facet queries hang off of pivots

Hoss Man (JIRA) Fri, 12 Jun 2015 17:03:07 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584299#comment-14584299
 ]


Hoss Man commented on SOLR-4212:
--------------------------------


Ok, I managed to review a little over half of the the current patch (trying to 
start with the small changes first and work my way up) -- comments below...


----

General feedback: This patch, on the whole, would be a lot easier to review -- 
and the final code a lot easier to make sense of -- if there were more 
class/method javadocs explaining what the purpose of everything was.  I realize 
that could be said for a lot of the existing Solr code, but adding/refactoring 
classses/methods is the best time to try and improve the situation since that's 
when you're thining about it the most.

----

bq. "SimpleFacets.getFacetCounts() was removed because it would have been weird 
for SimpleFacets to create and use its own sub-class 
RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced 
that bit of code with a static method in FacetComponent which should eliminate 
the repetition in MLTHandler.

If you don't think it makes sense to keep this method where it is that's fine, 
but it is a public method designed for people writting custom request handlers 
to use to add faceting to their stuff -- so at the very least you should leave 
a deprecated stub in it's place that calls your new method.

It also confuses the hell out of me that in the process of moving this method 
you made the method signature more complicated to use (more args to understand, 
more exceptions to deal with) and got rid of the javadocs so the new more 
complicated call signature is even harder to understand -- and I, frankly, 
can't make sense of why any of that is neccessary (even if should now live in a 
diff class because of the inheritence)...

* why does the caller have to pass in an empty NamedList to be mutated? why 
can't the method just return a NamedList like the old method did?
* why does the caller have to instantiate the date & range processors 
themselves?
** in both of the places this method is called, the processors are constructed 
just to pass to this method and then thrown away.
** So why can't this static method construct them internally (using the 
request, docset, params, and response builder from the SimpleFacets object 
that's also passed as an arg) and keep the method signature simple?
* why does this new method catch & wrap SyntaxErrors but not IOExceptions like 
the old method?

(If there reasons for these changes then great -- but they aren't clear just 
from reading the patch, and there's no javadocs to get guidance from)

----

bq. I removed the RangeFacetAccumulator class and replaced it with a 
LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends 
FacetBase just like PivotFacetValue and is responsible for aggregation of a 
single facet value.

* I don't see a class called "RangeFacet" ... it looks like you talking about 
RangeFacetValue?
** FWIW: PivotFacetValue doesn't extend FacetBase because it models a single 
(value,count) pair in the context of a PivotFacetField (which may be 
hierarchical) ... you may be thinking of "PivotFacet"
** ... in which case, should this class _atually_ be named "RangeFacet" ?
** either way: can we please beef up the javadocs of this class to be crystal 
clear about what it's modeling -- at first glance I thought it was just 
modeling a facet.range *param* (with the field, start, end, calculator) before 
I realized it not only models the field + the indiviual buckets, but also the 
values in those buckets (but not the other params)
* Isn't RangeFacetValue.fieldName redundent here? - it's already handled by 
FacetBase.facetOn.
* I don't think there is any reason to initialize shardFieldValues in 
mergeContributionFromShard before the {{rangeFacet = rangeFromShard}} 
short-circut, it's just wasted cycles the first time the method gets called, 
correct?

----

Looking at the changes to PivotFacetValue...

* why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList?
** it seems like a really bad idea for these to not be returned in a 
deterministic order -- it should be in the same order as the top level 
facet.range results (which is hte order of the facet.range params in the 
request)
** no idea if this HashMap usage currently causes problems in the shard 
merging, but it smells fishy -- and either way the user should be able to count 
on getting the range facets back in the same order the asked for them
* i didn't dig into the full ramifications of this, but you've got 
PivotFacetValue constructing RangeFacetValue objects and passing the response 
key in as the "fieldName" argument to the constructor -- those aren't the same 
thing.  Either the method call here is wrong, or the constructor arg should be 
renamed -- either way any usage of this "fieldName" constructor arg and 
internal usage should be sanity checked to ensure it's used consistnetly 
everywhere
** are there tests of using a diff response key from the fieldName when hanging 
ranges under pivots? (both single node and cloud?)
* why is PivotFacetValue.createFromNamedList conditionally calling 
RangeFacetValue.mergeContributionFromShard?
** Unless i'm missing something, that method is only ever creating entirely new 
instances of PivotFacetValue, so there shouldn't be any existing data to merge 
into ... correct?
* in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of 
looping over each shard's response to either construct a new RangeFacetValue or 
call RangeFacetValue.mergeContributionFromShard seems like it could/should be 
refactored into a static helper method (in the RangeFacetValue class)
** the exact same loop is used in the only other place 
mergeContributionFromShard is called (FacetComponent)
** if PivotFacetValue.createFromNamedList really does need to conditional merge 
(see previuos comment), then it's loop would also be exactly the same, and it 
could use the same static helper method.

----

Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct 
response format (ie: no special objects need constructed to model the simple 
counts) can't it just return "querycounts" if "receivingMap" is null?  (instead 
of looping & adding)

----

With the current patch, something is wrong with how localparams are dealt with 
when trying to use facet.ranges inside of pivots.  These queries are examples 
of doing range queries on the same field with diff gap params, and 
(independently) doing a pivot...

{noformat}
http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price

http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price
{noformat}

...but as soon as you try to hang those ranges on the pivot you get a local 
errors...

{noformat}
http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price
...
Missing required parameter: f.price.facet.range.start (or default: 
facet.range.start)


http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price
...
Missing required parameter: f.price.facet.range.gap (or default: 
facet.range.gap)
{noformat}

...the fact that this error only pops up when combining pivots + ranges 
suggests that the patch didn't _break_ the existing code controlling how the 
param parsing is done, it aparently just isn't re-using the exact same parsing 
code in both situations?  Which also means that within a single request, the 
range facet params are still getting parsed (at least) twice -- once 
(correctly) for the top level facets, and (at least) once (incorrectly) when 
hanging off of the pivot.

bq. ...Finally I settled for a crude approach of enhancing FacetBase with tags 
and just parsing the facet queries and ranges and putting them in the request 
context. ...

Hmmm....

I haven't seen this piece of code yet, but it sounds like it only prevents the 
"tags" from being parsed over and over again -- from what I've seen so far, 
RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates 
to 
RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams)
is still going to cause a new RangeEndpointCalculator to be constructed, and 
the exact same start/end/include/etc... Strings to be parsed over and over and 
over again, for every pivot value, at every level.  These things should really 
just be parsed *once* in prepare(), put into the request context, and reused at 
every level of every pivot.

To put it another way...


With the patch right now, if you do the following query using the techproducts 
example data:

{noformat}
http://localhost:8983/solr/techproducts/select?rows=0&q=*:*&facet=true&facet.range=manufacturedate_dt&facet.range.start=2000-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=%2B1YEAR
{noformat}

... then 41 DateRangeEndpointCalculator objects get instantated even though 1 
is plenty;  the start value of "2000-01-01T00:00:00Z" and the end value 
"2010-01-01T00:00:00Z" each get parsed into Date objects 41 times, even though 
the result is always exactly the same.  That seems really inefficient ... 
doesn't it defeat the purpose of doing that param parsing in prepare()?




> Let facet queries hang off of pivots
> ------------------------------------
>
>                 Key: SOLR-4212
>                 URL: https://issues.apache.org/jira/browse/SOLR-4212
>             Project: Solr
>          Issue Type: Sub-task
>          Components: search
>    Affects Versions: 4.0
>            Reporter: Steve Molloy
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 5.3, Trunk
>
>         Attachments: SOLR-4212-multiple-q.patch, SOLR-4212-multiple-q.patch, 
> SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, 
> SOLR-4212.patch, SOLR-4212.patch, SOLR-6353-6686-4212.patch, 
> SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch, 
> SOLR-6353-6686-4212.patch, patch-4212.txt
>
>
> Facet pivot provide hierarchical support for computing data used to populate 
> a treemap or similar visualization. TreeMaps usually offer users extra 
> information by applying an overlay color on top of the existing square sizes 
> based on hierarchical counts. This second count is based on user choices, 
> representing, usually with gradient, the proportion of the square that fits 
> the user's choices.
> The proposition is to use local parameters to specify facet query to apply 
> for pivot which matches a tag set on facet query. Parameter format would look 
> like:
> facet.pivot={!query=r1}category,manufacturer
> facet.query={!tag=r1}somequery
> facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-4212) Let facet queries hang off of pivots

Reply via email to