[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Watters updated SOLR-7543:
--------------------------------
    Description: 
I have a GraphQuery that I implemented a long time back that allows a user to 
specify a "startQuery" to identify which documents to start graph traversal 
from.  It then gathers up the edge ids for those documents , optionally applies 
an additional filter.  The query is then re-executed continually until no new 
edge ids are identified.  I am currently hosting this code up at 
https://github.com/kwatters/solrgraph and I would like to work with the 
community to get some feedback and ultimately get it committed back in as a 
lucene query.

Here's a bit more of a description of the parameters for the query / graph 
traversal:

q - the initial start query that identifies the universe of documents to start 
traversal from.
fromField - the field name that contains the node id
toField - the name of the field that contains the edge id(s).
traversalFilter - this is an additional query that can be supplied to limit the 
scope of graph traversal to just the edges that satisfy the traversalFilter 
query.
maxDepth - integer specifying how deep the breadth first search should go.
returnStartNodes - boolean to determine if the documents that matched the 
original "q" should be returned as part of the graph.
onlyLeafNodes - boolean that filters the graph query to only return 
documents/nodes that have no edges.

We identify a set of documents with "q" as any arbitrary lucene query.  It will 
collect the values in the fromField, create an OR query with those values , 
optionally apply an additional constraint from the "traversalFilter" and walk 
the result set until no new edges are detected.  Traversal can also be stopped 
at N hops away as defined with the maxDepth.  This is a BFS (Breadth First 
Search) algorithm.  Cycle detection is done by not revisiting the same document 
for edge extraction.  

This query operator does not keep track of how you arrived at the document, but 
only that the traversal did arrive at the document.

  was:I have a GraphQuery that I implemented a long time back that allows a 
user to specify a "seedQuery" to identify which documents to start graph 
traversal from.  It then gathers up the edge ids for those documents , 
optionally applies an additional filter.  The query is then re-executed 
continually until no new edge ids are identified.  I am currently hosting this 
code up at https://github.com/kwatters/solrgraph and I would like to work with 
the community to get some feedback and ultimately get it committed back in as a 
lucene query.


> Create GraphQuery that allows graph traversal as a query operator.
> ------------------------------------------------------------------
>
>                 Key: SOLR-7543
>                 URL: https://issues.apache.org/jira/browse/SOLR-7543
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Kevin Watters
>            Priority: Minor
>
> I have a GraphQuery that I implemented a long time back that allows a user to 
> specify a "startQuery" to identify which documents to start graph traversal 
> from.  It then gathers up the edge ids for those documents , optionally 
> applies an additional filter.  The query is then re-executed continually 
> until no new edge ids are identified.  I am currently hosting this code up at 
> https://github.com/kwatters/solrgraph and I would like to work with the 
> community to get some feedback and ultimately get it committed back in as a 
> lucene query.
> Here's a bit more of a description of the parameters for the query / graph 
> traversal:
> q - the initial start query that identifies the universe of documents to 
> start traversal from.
> fromField - the field name that contains the node id
> toField - the name of the field that contains the edge id(s).
> traversalFilter - this is an additional query that can be supplied to limit 
> the scope of graph traversal to just the edges that satisfy the 
> traversalFilter query.
> maxDepth - integer specifying how deep the breadth first search should go.
> returnStartNodes - boolean to determine if the documents that matched the 
> original "q" should be returned as part of the graph.
> onlyLeafNodes - boolean that filters the graph query to only return 
> documents/nodes that have no edges.
> We identify a set of documents with "q" as any arbitrary lucene query.  It 
> will collect the values in the fromField, create an OR query with those 
> values , optionally apply an additional constraint from the "traversalFilter" 
> and walk the result set until no new edges are detected.  Traversal can also 
> be stopped at N hops away as defined with the maxDepth.  This is a BFS 
> (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
> the same document for edge extraction.  
> This query operator does not keep track of how you arrived at the document, 
> but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to