I know exactly how this works. MarkLogic can be configured with separate 
execute and data nodes. But in MarkLogic, the execute nodes can do a lot of 
work. The query may be a mix of indexed searching and “table scan” searching, 
all expressed in an XQuery program.

It does not make sense for Solr. The distributed portion of query execution is 
just not enough work to farm out to CPU intensive nodes.

It will mean more nodes to do the same work.

The execute nodes would need to be part of the cluster in order to get config 
updates. But they would not host shards. So now we need a new kind of node in 
the collection.

Lots more code, lots more testing, no benefit.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)


> On Apr 19, 2017, at 9:40 AM, Dorian Hoxha <[email protected]> wrote:
> 
> @Walter:::::
> 
> Think of it this way.
> 
> 1. Separate data-solr from merger-solr. So merger-solr has more cpu, while 
> data-solr has more ram (very simplistic).
> This way, you can also scale them separately. (es has something like 
> search-only-node)
> 
> 2. Second step is to join client-app with merger-solr. So you do 1 less hop. 
> This node doesn't have to lose the global-idf, query-cache or whatever 
> merger-only-solr currently does.
> If the client-app is just a frontend/proxy for solr, then should be better.
> 
> 3. The whole point is to have fewer, more powerful machines. And each 
> client-app should be able to saturate it's own embedded-solr.
> 
> Makes sense ?
> 
> On Wed, Apr 19, 2017 at 6:29 PM, Walter Underwood <[email protected] 
> <mailto:[email protected]>> wrote:
> That is exactly what I thought you meant. Adds complexity with no benefit.
> 
> Now the merger needs to keep caches for global IDF. But those caches don’t 
> get the benefit of other requests to the same cluster.
> 
> I’m not sure if query result caches cache the results of distributed queries, 
> but if they do, then this approach looses that benefit too.
> 
> wunder
> Walter Underwood
> [email protected] <mailto:[email protected]>
> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
> 
> 
>> On Apr 19, 2017, at 9:01 AM, Dorian Hoxha <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> @Walter
>> 
>> Usually you have: client-app --> random-solr-node(mergerer) --> each other 
>> node that has a shard
>> While what I want: client-app (mergerer is in same jvm) --> each other node 
>> that has a shard
>> 
>> Makes sense ?
>> 
>> On Wed, Apr 19, 2017 at 4:50 PM, Walter Underwood <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Does not make sense to me. It would do more queries from the client to the 
>> cluster, not fewer. And those HTTP request would probably be slower than the 
>> intra-cluster requests.
>> 
>> I expect the distributed portion of the query load is small compared to 
>> other CPU usage.
>> 
>> It adds complexity for no gain in performance. Maybe a slight loss.
>> 
>> wunder
>> Walter Underwood
>> [email protected] <mailto:[email protected]>
>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
>> 
>> 
>>> On Apr 19, 2017, at 6:32 AM, Mikhail Khludnev <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hello, Dorian.
>>> I'm not sure about 1. But you can create EmbeddedSolrServer and add 
>>> "collection" parameter. It's what's done in 
>>> org.apache.solr.response.transform.SubQueryAugmenter [subquery]
>>> 
>>> On Wed, Apr 19, 2017 at 3:53 PM, Dorian Hoxha <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi friends,
>>> 
>>> Anybody has done this ? Reasons being: 1 less http-request when doing 
>>> distributed search. But also not storing data itself (like a 
>>> search-only-node). And the other nodes not caring about search-nodes.
>>> 
>>> Makes sense ?
>>> 
>>> Regards,
>>> Dorian
>>> 
>>> 
>>> 
>>> -- 
>>> Sincerely yours
>>> Mikhail Khludnev
>> 
>> 
> 
> 

Reply via email to