[ 
https://issues.apache.org/jira/browse/SOLR-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459653#comment-16459653
 ] 

Jeroen Steggink commented on SOLR-11384:
----------------------------------------

Is this still relevant? We have a streaming expression for graph traversal.

> add support for distributed graph query
> ---------------------------------------
>
>                 Key: SOLR-11384
>                 URL: https://issues.apache.org/jira/browse/SOLR-11384
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Kevin Watters
>            Priority: Minor
>
> Creating this ticket to track the work that I've done on the distributed 
> graph traversal support in solr.
> Current GraphQuery will only work on a single core, which introduces some 
> limits on where it can be used and also complexities if you want to scale it. 
>  I believe there's a strong desire to support a fully distributed method of 
> doing the Graph Query.  I'm working on a patch, it's not complete yet, but if 
> anyone would like to have a look at the approach and implementation,  I 
> welcome much feedback.  
> The flow for the distributed graph query is almost exactly the same as the 
> normal graph query.  The only difference is how it discovers the "frontier 
> query" at each level of the traversal.  
> When a distribute graph query request comes in, each shard begins by running 
> the root query, to know where to start on it's shard.  Each participating 
> shard then discovers it's edges for the next hop.  Those edges are then 
> broadcast to all other participating shards.  The shard then receives all the 
> parts of the frontier query , assembles it, and executes it.
> This process continues on each shard until there are no new edges left, or 
> the maxDepth of the traversal has finished.
> The approach is to introduce a FrontierBroker that resides as a singleton on 
> each one of the solr nodes in the cluster.  When a graph query is created, it 
> can do a getInstance() on it so it can listen on the frontier parts coming in.
> Initially, I was using an external Kafka broker to handle this, and it did 
> work pretty well.  The new approach is migrating the FrontierBroker to be a 
> request handler in Solr, and potentially to use the SolrJ client to publish 
> the edges to each node in the cluster.
> There are a few outstanding design questions, first being, how do we know 
> what the list of shards are that are participating in the current query 
> request?  Is that easy info to get at?
> Second,  currently, we are serializing a query object between the shards, 
> perhaps we should consider a slightly different abstraction, and serialize 
> lists of "edge" objects between the nodes.   The point of this would be to 
> batch the exploration/traversal of current frontier to help avoid large 
> bursts of memory being required.
> Thrid, what sort of caching strategy should be introduced for the frontier 
> queries, if any?  And if we do some caching there, how/when should the 
> entries be expired and auto-warmed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to