Re: Reviving DistributedSearch

Zachary Gramana Mon, 20 Aug 2012 09:38:03 -0700

Nick,

Thanks for the feedback. You may not have been looking for this long and 
complex of a reply, but I wanted to share my thinking and validate some 
assumptions with the group before I get too much further down the road.

Let me walk you through where my thinking is at, and see what you think.

First, some observations:

* MultiSearcher and RemoteSearchable is deprecated starting in Java Lucene 
starting with 3.1 
(http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106.mbox/%[email protected]%3E),
 and for good reason. Not only does it have some bugs related to scoring, etc., 
i

* IndexReader, as the service interface, results in excessive network chatter. 
Query, in my mind, sounds like the right abstraction. Parse an incoming query 
request once, distribute the query objects to core instances, then merge the 
results. IndexSearcher in 3.3 implements a merge TopDocs method, so this 
approach seems promising. This would also enable each core to use a request 
queue to handle concurrent requests. Query, Filter, etc., have been marked 
serlializable for a long time.

* I like Solr's separated Web/Core approach. The remoting-based approaches buy 
into a few of the 8 fallacies of distributed computing. The web/core approach, 
not so much.

* Java-Lucene has recently delegated distributed search to Solr (and 
ElasticSearch, Katta, IndexTank, etc) in v3.1 and later. This says (a) 
distributed search is hard, and (b) requires solving problems that are beyond 
the scope of Lucene. Unfortunately, this highlights the lack of a .NET Solr 
analog.

These observations lead me to the following questions:

1. Jeez, it would be nice if we had a .NET Solr-ish project. Kidding, kidding. 
Kind of.
2. Should distributed search live in Contribs, or in another project altogether?
3. Is there value in an in-between solution for #2? Perhaps something like a 
Solr Core only implementation, or a reference implementation that tackles a 
limited set of requirements?

I should disclose here that my interest in this code is part of a broader 
project that I'm running at my place of employment. This project will be 
released as open source once it hits minimum viable product (it's not 
proprietary, just early in development). This project is tightly integrated 
with ServiceStack. It is also currently self-hosted, with an IIS host coming 
shortly.

That said, Web API is very ServiceStack-like, though ServiceStack has some 
additional benefits: .NET 3.5 and Mono support, out of the box protocol buffers 
integration (and around another two dozen serialization formats, including a 
very fast JSON serializer),  nice cache and auth interfaces, and a simple 
plugin architecture. It's also based on the request/response pattern using 
strongly-typed DTO's, which I am a big proponent of. My project leverages these 
features quite a bit.

I anticipate following a model similar to Solr Web/Core. The biggest questions 
I'm currently wrestling with are #3 and #2. Should the core be able to stand 
alone in a limited capacity? If so, does it makes sense for it to live in 
Contribs? I would naturally prefer to use ServiceStack to build it, consistent 
with the rest of my project. I would also take advantage of its protocol 
buffers support to improve performance, since this would be a peer-to-peer API 
and not client-server API. However, if a standalone core were to live in 
contribs I would want to make sure most people have a comfort level with that 
dependency.

When I think of all of the features that need to be implemented in a core, like 
configuration and authentication, I start heading back towards distributed 
search living outside of Contribs.

- Zack

On Aug 17, 2012, at 8:43 PM, Nicholas Paldino [.NET/C# MVP] 
<[email protected]> wrote:

> Zach,
> 
> Just a suggestion, maybe going the web API route and self hosting (which 
> allows for something more RESTful and  with good bindings for JSON, XML, et 
> al):
> 
> http://code.msdn.microsoft.com/ASPNET-Web-API-Self-Host-30abca12
> 
> http://www.asp.net/web-api/overview/hosting-aspnet-web-api/self-host-a-web-api
> 
> - Nick
>

Re: Reviving DistributedSearch

Reply via email to