[ https://issues.apache.org/jira/browse/SOLR-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479952#comment-16479952 ]
Hoss Man commented on SOLR-9480: -------------------------------- Updated patch with all nocommits resolved and new ref-guide content on the relatedness() aggregate function and using them to build SKGs. I think this is pretty much good to go. ---- {quote}can you give a clue what are {{$fore,$back}} ? {quote} I'm not sure if i understand your question... are you asking about the syntax, or about the general concepts of foreground/background query as used in the relatedness function scores? Syntactically they are regular query param {{$variable}} references passed as function arguments ... the sample request in the comment you replied to defined them as {{fore=body:%22harry+potter%22&back=\*:*}} ...but they can also just be passed in as string literals. In general, the {{relatedness()}} function takes 2 parameters that define a "foreground query" and a "background query" which are then used to compute the hueristic score indicating what sort of statistical corrolation there is between the query for each facet bucket and the foreground set, relative to the background set. There's a more self contained example in the ref-guide edits included in the latest patch... {noformat} .Sample Documents [source,bash,subs="verbatim,callouts"] ---- curl -sS -X POST 'http://localhost:8983/solr/gettingstarted/update?commit=true' -d '[ {"id":"01",age:15,"state":"AZ","hobbies":["soccer","painting","cycling"]}, {"id":"02",age:22,"state":"AZ","hobbies":["swimming","darts","cycling"]}, {"id":"03",age:27,"state":"AZ","hobbies":["swimming","frisbee","painting"]}, {"id":"04",age:33,"state":"AZ","hobbies":["darts"]}, {"id":"05",age:42,"state":"AZ","hobbies":["swimming","golf","painting"]}, {"id":"06",age:54,"state":"AZ","hobbies":["swimming","golf"]}, {"id":"07",age:67,"state":"AZ","hobbies":["golf","painting"]}, {"id":"08",age:71,"state":"AZ","hobbies":["painting"]}, {"id":"09",age:14,"state":"CO","hobbies":["soccer","frisbee","skiing","swimming","skating"]}, {"id":"10",age:23,"state":"CO","hobbies":["skiing","darts","cycling","swimming"]}, {"id":"11",age:26,"state":"CO","hobbies":["skiing","golf"]}, {"id":"12",age:35,"state":"CO","hobbies":["golf","frisbee","painting","skiing"]}, {"id":"13",age:47,"state":"CO","hobbies":["skiing","darts","painting","skating"]}, {"id":"14",age:51,"state":"CO","hobbies":["skiing","golf"]}, {"id":"15",age:64,"state":"CO","hobbies":["skating","cycling"]}, {"id":"16",age:73,"state":"CO","hobbies":["painting"]}, ]' ---- .Example Query [source,bash,subs="verbatim,callouts"] ---- curl -sS -X POST http://localhost:8983/solr/gettingstarted/query -d 'rows=0&q=*:* &back=*:* # <1> &fore=age:[35 TO *] # <2> &json.facet={ hobby : { type : terms, field : hobbies, limit : 5, sort : { r1: desc }, # <3> facet : { r1 : "relatedness($fore,$back)", # <4> location : { type : terms, field : state, limit : 2, sort : { r2: desc }, # <3> facet : { r2 : "relatedness($fore,$back)" # <4> } } } } }' ---- <1> Use the entire collection as our "Background Set" <2> Use a query for "age >= 35" to define our (initial) "Foreground Set" <3> For both the top level `hobbies` facet & the sub-facet on `state` we will be sorting on the `relatedness(...)` values <4> In both calls to the `relatedness(...)` function, we use <<local-parameters-in-queries.adoc#parameter-dereferencing,Parameter Variables>> to refer to the previously defined `fore` and `back` queries. .The Facet Response [source,javascript,subs="verbatim,callouts"] ---- "facets":{ "count":16, "hobby":{ "buckets":[{ "val":"golf", "count":6, // <1> "r1":{ "relatedness":0.01225, "foreground_popularity":0.3125, // <2> "background_popularity":0.375}, // <3> "location":{ "buckets":[{ "val":"az", "count":3, "r2":{ "relatedness":0.00496, // <4> "foreground_popularity":0.1875, // <6> "background_popularity":0.5}}, // <7> { "val":"co", "count":3, "r2":{ "relatedness":-0.00496, // <5> "foreground_popularity":0.125, "background_popularity":0.5}}]}}, { "val":"painting", "count":8, // <1> "r1":{ "relatedness":0.01097, "foreground_popularity":0.375, "background_popularity":0.5}, "location":{ "buckets":[{ ... ---- <1> Even though `hobbies:golf` has a lower total facet `count` then `hobbies:painting`, it has a higher `relatedness` score, indicating that relative to the Background Set (the entire collection) Golf has a stronger correlation to our Foreground Set (people age 35+) then Painting. <2> The number of documents matching `age:[35 TO *]` _and_ `hobbies:golf` is 31.25% of the total number of documents in the Background Set <3> 37.5% of the documents in the Background Set match `hobbies:golf` <4> The state of Arizona (AZ) has a _positive_ relatedness correlation with the _nested_ Foreground Set (people ages 35+ who play Golf) compared to the Background Set -- ie: "People in Arizona are statistically more likely to be '35+ year old Golfers' then the country as a whole." <5> The state of Colorado (CO) has a _negative_ correlation with the nested Foreground Set -- ie: "People in Colorado are statistically less likely to be '35+ year old Golfers' then the country as a whole." <6> The number documents matching `age:[35 TO *]` _and_ `hobbies:golf` _and_ `state:AZ` is 18.75% of the total number of documents in the Background Set <7> 50% of the documents in the Background Set match `state:AZ` NOTE: While it's very common to define the Background Set as `\*:*`, or some other super-set of the Foreground Query, it is not strictly required. The `relatedness(...)` function can be used to compare the statistical relatedness of sets of documents to orthogonal foreground/background queries. {noformat} > Graph Traversal for Significantly Related Terms (Semantic Knowledge Graph) > -------------------------------------------------------------------------- > > Key: SOLR-9480 > URL: https://issues.apache.org/jira/browse/SOLR-9480 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Trey Grainger > Priority: Major > Attachments: SOLR-9480.patch, SOLR-9480.patch, SOLR-9480.patch, > SOLR-9480.patch, SOLR-9480.patch, SOLR-9480.patch > > > This issue is to track the contribution of the Semantic Knowledge Graph Solr > Plugin (request handler), which exposes a graph-like interface for > discovering and traversing significant relationships between entities within > an inverted index. > This data model has been described in the following research paper: [The > Semantic Knowledge Graph: A compact, auto-generated model for real-time > traversal and ranking of any relationship within a > domain|https://arxiv.org/abs/1609.00464], as well as in presentations I gave > in October 2015 at [Lucene/Solr > Revolution|http://www.slideshare.net/treygrainger/leveraging-lucenesolr-as-a-knowledge-graph-and-intent-engine] > and November 2015 at the [Bay Area Search > Meetup|http://www.treygrainger.com/posts/presentations/searching-on-intent-knowledge-graphs-personalization-and-contextual-disambiguation/]. > The source code for this project is currently available at > [https://github.com/careerbuilder/semantic-knowledge-graph], and the folks at > CareerBuilder (where this was built) have given me the go-ahead to now > contribute this back to the Apache Solr Project, as well. > Check out the Github repository, research paper, or presentations for a more > detailed description of this contribution. Initial patch coming soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org