[
https://issues.apache.org/jira/browse/SOLR-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243133#comment-15243133
]
Yonik Seeley edited comment on SOLR-8998 at 4/17/16 10:40 PM:
--------------------------------------------------------------
Although we don't need to implement everything all at once, we should be
thinking ahead about everything we want to do.
h3. Existing block parent faceting example.
Incoming domain consists of children (reviews) who are then mapped to parents
(books) before faceting is done:
{code}
q=type:review AND review_author:yonik
json.facet={
genres : {
type : field,
field : genre,
domain: { blockParent : "type:book" }
}
}
{code}
h3. Desirable features:
- ability to "pretend" that parent documents have all values of their child
documents ( a union() set rollup?)
- numeric rollups (min, max, avg, etc) and the ability to use range faceting
over these values
- an API that's sharable (to the degree that makes sense) with other places
that need rollups (i.e. normal join)
- maximum "persistence" of rolled-up values... meaning they should be ideally
usable in any context that other field values would be usable in.
-- example: multiple levels of sub-facets operating on values that were
rolled up at a higher level
-- use in function queries
-- use in a sort, or retrievable from topdocs (SOLR-7830)
h3. Ideas:
- we already have a syntax for rolling up values over a bucket (avg(field1),
min(field2) etc), re-use that as much as possible
- we're going to need some sort of context based registry for information about
rolled-up child documents (and/or about which fields were rolled up)
h3. Use cases:
We have products, which have multiple SKUs, and we want to facet by color on
the products.
{code}
Parent1: { type:product, name:"Solr T-Shirt" }
Child1: { type:SKU, size:L, color:Red, inStock:true}
Child2: { type:SKU, size:L, color:Blue, inStock:false}
Child3: { type:SKU, size:M, color:Red, inStock:true}
Child4: { type:SKU, size:S, color:Blue, inStock:true}
Now, we want to facet by "color" and get back numbers of products (not number
of SKUs). Hence if our query is inStock:true, we want {Blue:1 and Red:1}.
Put another way, we want a virtual "color" field on Parent1 containing all the
colors of matching child documents.
{code}
h3. Approach 1: specify rollups at the point of the join
Specify rollups where the child->parent join/mapping is being done.
Our basic child->parent mapping is currently specified by:
{code}
domain: { blockParent : "type:book" }
{code}
We could add rollup specifications to that in a number of different ways.
Reuse "blockParent" tag, but make it more structured, adding a "parentFilter"
and then other rollups.
{code}
domain: {
blockParent : {
parentFilter : "type:book",
average_rating : "avg(rating)"
}
}
{code}
Downside: name collisions... say you wanted to name a rollup the same name as
something like "parentFilter"
Advantages: flatter structure is simpler, and since we chose rollup names, the
namespace issue is likely just academic.
Or, we could have a specific "rollups" tag if a unique namespace is desired:
{code}
domain: {
blockParent : {
parentFilter : "type:book",
rollups: {
average_rating : "avg(rating)"
}
}
}
{code}
h4. Use of specified rollups:
{code}
q=type:review AND review_year:2016
json.facet={
genres : {
type : field,
field : genre,
domain: {
blockParent : {
parentFilter : "type:book",
book_rating : "avg(review_rating)"
}
},
facet : {
// things in here are calculated per-bucket of the parent facet
avg_rating : "avg(book_rating)",
min_rating : "min(book_rating)"
},
sort : "avg_rating desc"
}
}
{code}
h3. Approach 2: refer to children from the POV of the parent later
This approach does not explicitly specify any rollups at the point of the join,
but lets one specify them later by referring to child fields using something
like child.<child_field_name>.
Or perhaps even <child_type>.<field_field_name>... (related to SOLR-7672)
Or as a function: child(child_field_name)
{code}
q=type:review AND review_year:2016
json.facet={
genres : {
type : field,
field : genre,
domain: { blockParent : "type:book" }
facet : {
// things in here are calculated per-bucket of the parent facet
avg_rating : "avg(avg(child.review_rating))",
min_rating : "min(avg(child.review_rating))"
},
sort : "avg_rating desc"
}
}
{code}
Advantages:
- fewer syntactic elements... simpler?
Disadvantages:
- "child" name doesn't make sense for non-block (normal) join
- more difficult to implement
-- parsing... any place that would normally take a simple name needs to take
a rollup function
-- at the point of doing the join, we need to know what information is
required to be kept? we can figure this out, but it requires inspecting all
sub-facets?
was (Author: [email protected]):
Although we don't need to implement everything all at once, we should be
thinking ahead about everything we want to do.
h3. Existing block parent faceting example.
Incoming domain consists of children (reviews) who are then mapped to parents
(books) before faceting is done:
{code}
q=type:review AND review_author:yonik
json.facet={
genres : {
type : field,
field : genre,
domain: { blockParent : "type:book" }
}
}
{code}
h3. Desirable features:
- ability to "pretend" that parent documents have all values of their child
documents ( a union() set rollup?)
- numeric rollups (min, max, avg, etc) and the ability to use range faceting
over these values
- an API that's sharable (to the degree that makes sense) with other places
that need rollups (i.e. normal join)
- maximum "persistence" of rolled-up values... meaning they should be ideally
usable in any context that other field values would be usable in.
-- example: multiple levels of sub-facets operating on values that were
rolled up at a higher level
-- use in function queries
-- use in a sort, or retrievable from topdocs (SOLR-7830)
h3. Ideas:
- we already have a syntax for rolling up values over a bucket (avg(field1),
min(field2) etc), re-use that as much as possible
- we're going to need some sort of context based registry for information about
rolled-up child documents (and/or about which fields were rolled up)
h3. Use cases:
We have products, which have multiple SKUs, and we want to facet by color on
the products.
{code}
Parent1: { type:product, name:"Solr T-Shirt" }
Child1: { type:SKU, size:L, color:Red, inStock:true}
Child2: { type:SKU, size:L, color:Blue, inStock:false}
Child3: { type:SKU, size:M, color:Red, inStock:true}
Child4: { type:SKU, size:S, color:Blue, inStock:true}
Now, we want to facet by "color" and get back numbers of products (not number
of SKUs). Hence if our query is inStock:true, we want {Blue:1 and Red:1}.
Put another way, we want a virtual "color" field on Parent1 containing all the
colors of matching child documents.
{code}
h3. Approach 1: specify rollups at the point of the join
Specify rollups where the child->parent join/mapping is being done.
Our basic child->parent mapping is currently specified by:
{code}
domain: { blockParent : "type:book" }
{code}
We could add rollup specifications to that in a number of different ways.
Reuse "blockParent" tag, but make it more structured, adding a "parentFilter"
and then other rollups.
{code}
domain: {
blockParent : {
parentFilter : "type:book",
average_rating : "avg(rating)"
}
}
{code}
Downside: name collisions... say you wanted to name a rollup the same name as
something like "parentFilter"
Advantages: flatter structure is simpler, and since we chose rollup names, the
namespace issue is likely just academic.
Or, we could have a specific "rollups" tag if a unique namespace is desired:
{code}
domain: {
blockParent : {
parentFilter : "type:book",
rollups: {
average_rating : "avg(rating)"
}
}
}
{code}
h4. Use of specified rollups:
{code}
q=type:review AND review_year:2016
json.facet={
genres : {
type : field,
field : genre,
domain: {
blockParent : {
parentFilter : "type:book",
book_rating : "avg(review_rating)"
}
},
facet : {
// things in here are calculated per-bucket of the parent facet
avg_rating : "avg(book_rating)"
},
sort : "avg_rating desc"
}
}
{code}
> JSON Facet API child roll-ups
> -----------------------------
>
> Key: SOLR-8998
> URL: https://issues.apache.org/jira/browse/SOLR-8998
> Project: Solr
> Issue Type: New Feature
> Components: Facet Module
> Reporter: Yonik Seeley
>
> The JSON Facet API currently has the ability to map between parents and
> children ( see http://yonik.com/solr-nested-objects/ )
> This issue is about adding a true rollup ability where parents would take on
> derived values from their children. The most important part (and the most
> difficult part) will be the external API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]