[
https://issues.apache.org/jira/browse/SOLR-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241884#comment-15241884
]
Joel Bernstein commented on SOLR-8925:
--------------------------------------
Initial unit tests are coming along well. I plan to move on to manual testing
with the Enron email dataset and if that looks good I think this is pretty
close to being committed to trunk.
> Add gatherNodes Streaming Expression to support breadth first traversals
> ------------------------------------------------------------------------
>
> Key: SOLR-8925
> URL: https://issues.apache.org/jira/browse/SOLR-8925
> Project: Solr
> Issue Type: New Feature
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Fix For: 6.1
>
> Attachments: SOLR-8925.patch, SOLR-8925.patch, SOLR-8925.patch,
> SOLR-8925.patch, SOLR-8925.patch, SOLR-8925.patch, SOLR-8925.patch
>
>
> The gatherNodes Streaming Expression is a flexible general purpose breadth
> first graph traversal. It uses the same parallel join under the covers as
> (SOLR-8888) but is much more generalized and can be used for a wide range of
> use cases.
> Sample syntax:
> {code}
> gatherNodes(friends,
> gatherNodes(friends,
> search(articles, q=“body:(queryA)”, fl=“author”),
> walk ="author->user”,
> gather="friend"),
> walk=“friend->user”,
> gather="friend",
> scatter=“branches, leaves”)
> {code}
> The expression above is evaluated as follows:
> 1) The inner search() expression is evaluated on the *articles* collection,
> emitting a Stream of Tuples with the author field populated.
> 2) The inner gatherNodes() expression reads the Tuples form the search()
> stream and traverses to the *friends* collection by performing a distributed
> join between articles.author and friends.user field. It gathers the value
> from the *friend* field during the join.
> 3) The inner gatherNodes() expression then emits the *friend* Tuples. By
> default the gatherNodes function emits only the leaves which in this case are
> the *friend* tuples.
> 4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses
> again in the "friends" collection, this time performing the join between
> *friend* Tuples emitted in step 3. This collects the friend of friends.
> 5) The outer gatherNodes() expression emits the entire graph that was
> collected. This is controlled by the "scatter" parameter. In the example the
> *root* nodes are the authors, the *branches* are the author's friends and the
> *leaves* are the friend of friends.
> This traversal is fully distributed and cross collection.
> *Aggregations* are also supported during the traversal. This can be useful
> for making recommendations based on co-occurance counts: Sample syntax:
> {code}
> top(
> gatherNodes(baskets,
> search(baskets, q=“prodid:X”, fl=“basketid”, rows=“500”,
> sort=“random_7897987 asc”),
> walk =“basketid->basketid”,
> gather=“prodid”,
> fl=“prodid, price”,
> count(*),
> avg(price)),
> n=4,
> sort=“count(*) desc, avg(price) asc”)
> {code}
> In the expression above, the inner search() function searches the basket
> collection for 500 random basketId's that have the prodid X.
> gatherNodes then traverses the basket collection and gathers all the prodid's
> for the selected basketIds.
> It also aggregates the counts and average price for each productid collected.
> The count reflects the co-occurance count for each prodid gathered and prodid
> X. The outer *top* expression selects the top 4 prodid's emitted from
> gatherNodes, based the co-occurance count and avg price.
> Like all streaming expressions the gatherNodes expression can be combined
> with other streaming expressions. For example the following expression uses a
> hashJoin to intersect the network of friends rooted to authors found with
> different queries:
> {code}
> hashInnerJoin(
> gatherNodes(friends,
> gatherNodes(friends,
> search(articles,
> q=“body:(queryA)”, fl=“author”),
> walk ="author->user”,
> gather="friend"),
> walk=“friend->user”,
> gather="friend",
> scatter=“branches, leaves”),
> gatherNodes(friends,
> gatherNodes(friends,
> search(articles,
> q=“body:(queryB)”, fl=“author”),
> walk ="author->user”,
> gather="friend"),
> walk=“friend->user”,
> gather="friend",
> scatter=“branches, leaves”),
> on=“friend”
> )
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]