[jira] [Comment Edited] (SOLR-8998) JSON Facet API child roll-ups

Yonik Seeley (JIRA) Fri, 22 Apr 2016 08:18:02 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243133#comment-15243133
 ]


Yonik Seeley edited comment on SOLR-8998 at 4/22/16 3:16 PM:
-------------------------------------------------------------

Although we don't need to implement everything all at once, we should be 
thinking ahead about everything we want to do.

h3. Existing block parent faceting example.
Incoming domain consists of children (reviews) who are then mapped to parents 
(books) before faceting is done:
{code}
q=type:review AND review_author:yonik
json.facet={
  genres : {
    type : field,
    field : genre,
    domain: { blockParent : "type:book" }
  }
}
{code}

h3. Desirable features:
- ability to "pretend" that parent documents have all values of their child 
documents ( a union() set rollup?)
- numeric rollups (min, max, avg, etc) and the ability to use range faceting 
over these values
- an API that's sharable (to the degree that makes sense) with other places 
that need rollups (i.e. normal join)
- maximum "persistence" of rolled-up values... meaning they should be ideally 
usable in any context that other field values would be usable in.
  -- example: multiple levels of sub-facets operating on values that were 
rolled up at a higher level
  -- use in function queries
  -- use in a sort, or retrievable from topdocs (SOLR-7830)

h3. Ideas:
- we already have a syntax for rolling up values over a bucket (avg(field1), 
min(field2) etc), re-use that as much as possible
- we're going to need some sort of context based registry for information about 
rolled-up child documents (and/or about which fields were rolled up)

h3. Use case 1:

We have products, which have multiple SKUs, and we want to facet by color on 
the products.
{code}
Parent1: { type:product, name:"Solr T-Shirt" }
Child1: { type:SKU, size:L, color:Red, inStock:true}
Child2: { type:SKU, size:L, color:Blue, inStock:false}
Child3: { type:SKU, size:M, color:Red, inStock:true}
Child4: { type:SKU, size:S, color:Blue, inStock:true}
{code}
Now, we want to facet by "color" and get back numbers of products (not number 
of SKUs).  Hence if our query is inStock:true, we want Blue:1 and Red:1.
Put another way, we want a virtual "color" field on Parent1 containing all the 
colors of matching child documents.

h4. Use case 1a: input domain is children
Main query finds children, and hence the root faceting domain consists of 
children.  The block join is done in a facet via {code} 
domain:{blockParent="type:product"} {code}

h4. Usecase 1b, input domain is parents from previous block join
Main query selected products by including a blockJoin filter (mapping from 
children to parents).

h4. Use case 1c: input domain is parents, no previous block join
No previous block join (or an irrelevant one), but we still want to roll up 
children (all children, or a specific subset).

h3. Approach 1: specify rollups at the point of the join
Specify rollups where the child->parent join/mapping is being done.

Our basic child->parent mapping is currently specified by:
{code}
    domain: { blockParent : "type:book" }
{code}
We could add rollup specifications to that in a number of different ways.
Reuse "blockParent" tag, but make it more structured, adding a "parentFilter" 
and then other rollups.
{code}
    domain: { 
      blockParent : {
        parentFilter : "type:book",
        average_rating : "avg(rating)" 
      }
    }
{code}
Downside: name collisions... say you wanted to name a rollup the same name as 
something like "parentFilter"
Advantages: flatter structure is simpler, and since we chose rollup names, the 
namespace issue is likely just academic.

Or, we could have a specific "rollups" tag if a unique namespace is desired:
{code}
    domain: { 
      blockParent : {
        parentFilter : "type:book",
        rollups: {
          average_rating : "avg(rating)"
        } 
      }
    }
{code}

h4. Use of specified rollups:
{code}
q=type:review AND review_year:2016
json.facet={
  genres : {
    type : field,
    field : genre,
    domain: { 
      blockParent : {
        parentFilter : "type:book",
        book_rating : "avg(review_rating)"
      }
    },
    facet : {
       // things in here are calculated per-bucket of the parent facet
       avg_rating : "avg(book_rating)",
       min_rating : "min(book_rating)"
    },
    sort : "avg_rating desc"
  }
}
{code}

h3. Approach 2: refer to children from the POV of the parent later
This approach does not explicitly specify any rollups at the point of the join, 
but lets one specify them later by referring to child fields using something 
like child.<child_field_name>.
Or perhaps even <child_type>.<field_field_name>... (related to SOLR-7672)
Or as a function: child(child_field_name)

{code}
q=type:review AND review_year:2016
json.facet={
  genres : {
    type : field,
    field : genre,
    domain: { blockParent :  "type:book" }
    facet : {
       // things in here are calculated per-bucket of the parent facet
       avg_rating : "avg(avg(child.review_rating))",
       min_rating : "min(avg(child.review_rating))"
    },
    sort : "avg_rating desc"
  }
}
{code}
Advantages:
 - fewer syntactic elements... simpler?

Disadvantages:
 - "child" name doesn't make sense for non-block (normal) join
 - more difficult to implement
   -- parsing... any place that would normally take a simple name needs to take 
a rollup function
   -- at the point of doing the join, we need to know what information is 
required to be kept?  we can figure this out, but it requires inspecting all 
sub-facets?


was (Author: [email protected]):
Although we don't need to implement everything all at once, we should be 
thinking ahead about everything we want to do.

h3. Existing block parent faceting example.
Incoming domain consists of children (reviews) who are then mapped to parents 
(books) before faceting is done:
{code}
q=type:review AND review_author:yonik
json.facet={
  genres : {
    type : field,
    field : genre,
    domain: { blockParent : "type:book" }
  }
}
{code}

h3. Desirable features:
- ability to "pretend" that parent documents have all values of their child 
documents ( a union() set rollup?)
- numeric rollups (min, max, avg, etc) and the ability to use range faceting 
over these values
- an API that's sharable (to the degree that makes sense) with other places 
that need rollups (i.e. normal join)
- maximum "persistence" of rolled-up values... meaning they should be ideally 
usable in any context that other field values would be usable in.
  -- example: multiple levels of sub-facets operating on values that were 
rolled up at a higher level
  -- use in function queries
  -- use in a sort, or retrievable from topdocs (SOLR-7830)

h3. Ideas:
- we already have a syntax for rolling up values over a bucket (avg(field1), 
min(field2) etc), re-use that as much as possible
- we're going to need some sort of context based registry for information about 
rolled-up child documents (and/or about which fields were rolled up)

h3. Use cases:

We have products, which have multiple SKUs, and we want to facet by color on 
the products.
{code}
Parent1: { type:product, name:"Solr T-Shirt" }
Child1: { type:SKU, size:L, color:Red, inStock:true}
Child2: { type:SKU, size:L, color:Blue, inStock:false}
Child3: { type:SKU, size:M, color:Red, inStock:true}
Child4: { type:SKU, size:S, color:Blue, inStock:true}

Now, we want to facet by "color" and get back numbers of products (not number 
of SKUs).  Hence if our query is inStock:true, we want {Blue:1 and Red:1}.
Put another way, we want a virtual "color" field on Parent1 containing all the 
colors of matching child documents.
{code}

h3. Approach 1: specify rollups at the point of the join
Specify rollups where the child->parent join/mapping is being done.

Our basic child->parent mapping is currently specified by:
{code}
    domain: { blockParent : "type:book" }
{code}
We could add rollup specifications to that in a number of different ways.
Reuse "blockParent" tag, but make it more structured, adding a "parentFilter" 
and then other rollups.
{code}
    domain: { 
      blockParent : {
        parentFilter : "type:book",
        average_rating : "avg(rating)" 
      }
    }
{code}
Downside: name collisions... say you wanted to name a rollup the same name as 
something like "parentFilter"
Advantages: flatter structure is simpler, and since we chose rollup names, the 
namespace issue is likely just academic.

Or, we could have a specific "rollups" tag if a unique namespace is desired:
{code}
    domain: { 
      blockParent : {
        parentFilter : "type:book",
        rollups: {
          average_rating : "avg(rating)"
        } 
      }
    }
{code}

h4. Use of specified rollups:
{code}
q=type:review AND review_year:2016
json.facet={
  genres : {
    type : field,
    field : genre,
    domain: { 
      blockParent : {
        parentFilter : "type:book",
        book_rating : "avg(review_rating)"
      }
    },
    facet : {
       // things in here are calculated per-bucket of the parent facet
       avg_rating : "avg(book_rating)",
       min_rating : "min(book_rating)"
    },
    sort : "avg_rating desc"
  }
}
{code}

h3. Approach 2: refer to children from the POV of the parent later
This approach does not explicitly specify any rollups at the point of the join, 
but lets one specify them later by referring to child fields using something 
like child.<child_field_name>.
Or perhaps even <child_type>.<field_field_name>... (related to SOLR-7672)
Or as a function: child(child_field_name)

{code}
q=type:review AND review_year:2016
json.facet={
  genres : {
    type : field,
    field : genre,
    domain: { blockParent :  "type:book" }
    facet : {
       // things in here are calculated per-bucket of the parent facet
       avg_rating : "avg(avg(child.review_rating))",
       min_rating : "min(avg(child.review_rating))"
    },
    sort : "avg_rating desc"
  }
}
{code}
Advantages:
 - fewer syntactic elements... simpler?

Disadvantages:
 - "child" name doesn't make sense for non-block (normal) join
 - more difficult to implement
   -- parsing... any place that would normally take a simple name needs to take 
a rollup function
   -- at the point of doing the join, we need to know what information is 
required to be kept?  we can figure this out, but it requires inspecting all 
sub-facets?

> JSON Facet API child roll-ups
> -----------------------------
>
>                 Key: SOLR-8998
>                 URL: https://issues.apache.org/jira/browse/SOLR-8998
>             Project: Solr
>          Issue Type: New Feature
>          Components: Facet Module
>            Reporter: Yonik Seeley
>
> The JSON Facet API currently has the ability to map between parents and 
> children ( see http://yonik.com/solr-nested-objects/ )
> This issue is about adding a true rollup ability where parents would take on 
> derived values from their children.  The most important part (and the most 
> difficult part) will be the external API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8998) JSON Facet API child roll-ups

Reply via email to