Nick, Thanks for the feedback. You may not have been looking for this long and complex of a reply, but I wanted to share my thinking and validate some assumptions with the group before I get too much further down the road.
Let me walk you through where my thinking is at, and see what you think. First, some observations: * MultiSearcher and RemoteSearchable is deprecated starting in Java Lucene starting with 3.1 (http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106.mbox/%[email protected]%3E), and for good reason. Not only does it have some bugs related to scoring, etc., i * IndexReader, as the service interface, results in excessive network chatter. Query, in my mind, sounds like the right abstraction. Parse an incoming query request once, distribute the query objects to core instances, then merge the results. IndexSearcher in 3.3 implements a merge TopDocs method, so this approach seems promising. This would also enable each core to use a request queue to handle concurrent requests. Query, Filter, etc., have been marked serlializable for a long time. * I like Solr's separated Web/Core approach. The remoting-based approaches buy into a few of the 8 fallacies of distributed computing. The web/core approach, not so much. * Java-Lucene has recently delegated distributed search to Solr (and ElasticSearch, Katta, IndexTank, etc) in v3.1 and later. This says (a) distributed search is hard, and (b) requires solving problems that are beyond the scope of Lucene. Unfortunately, this highlights the lack of a .NET Solr analog. These observations lead me to the following questions: 1. Jeez, it would be nice if we had a .NET Solr-ish project. Kidding, kidding. Kind of. 2. Should distributed search live in Contribs, or in another project altogether? 3. Is there value in an in-between solution for #2? Perhaps something like a Solr Core only implementation, or a reference implementation that tackles a limited set of requirements? I should disclose here that my interest in this code is part of a broader project that I'm running at my place of employment. This project will be released as open source once it hits minimum viable product (it's not proprietary, just early in development). This project is tightly integrated with ServiceStack. It is also currently self-hosted, with an IIS host coming shortly. That said, Web API is very ServiceStack-like, though ServiceStack has some additional benefits: .NET 3.5 and Mono support, out of the box protocol buffers integration (and around another two dozen serialization formats, including a very fast JSON serializer), nice cache and auth interfaces, and a simple plugin architecture. It's also based on the request/response pattern using strongly-typed DTO's, which I am a big proponent of. My project leverages these features quite a bit. I anticipate following a model similar to Solr Web/Core. The biggest questions I'm currently wrestling with are #3 and #2. Should the core be able to stand alone in a limited capacity? If so, does it makes sense for it to live in Contribs? I would naturally prefer to use ServiceStack to build it, consistent with the rest of my project. I would also take advantage of its protocol buffers support to improve performance, since this would be a peer-to-peer API and not client-server API. However, if a standalone core were to live in contribs I would want to make sure most people have a comfort level with that dependency. When I think of all of the features that need to be implemented in a core, like configuration and authentication, I start heading back towards distributed search living outside of Contribs. - Zack On Aug 17, 2012, at 8:43 PM, Nicholas Paldino [.NET/C# MVP] <[email protected]> wrote: > Zach, > > Just a suggestion, maybe going the web API route and self hosting (which > allows for something more RESTful and with good bindings for JSON, XML, et > al): > > http://code.msdn.microsoft.com/ASPNET-Web-API-Self-Host-30abca12 > > http://www.asp.net/web-api/overview/hosting-aspnet-web-api/self-host-a-web-api > > - Nick >
