On Mon, Nov 7, 2011 at 1:50 PM, Marvin Humphrey <[email protected]> wrote: > On Sun, Nov 06, 2011 at 08:39:51PM -0800, Dan Markham wrote: >> ZeroMQ and Google's Protocol Buffers both looking great for building a >> distributed search solution. > > The idea of normalizing our current ad-hoc serialization mechanism using > Google Protocol Buffers seems interesting, though it looks like it might be a > lot of work and messy besides. > > First, Protocol Buffers doesn't support C -- only C++, Python and Java -- so > we'd have to write our own custom plugin. Dunno how hard that is.
While I'm relying on Google rather than experience, I don't think that C support is actually a problem. There seem to be C bindings: http://code.google.com/p/protobuf-c/ Or roll your own: http://blog.reverberate.org/2008/07/12/100-lines-of-c-that-can-parse-any-protocol-buffer/ > Second, the Protocol Buffers compiler is a heavy dependency -- too big to > bundle. We'd have to capture the generated source files in version control. Alternatively, it could just be a dependency. While I recognize your desire to keep the core free of such, I think it's entirely reasonable for LucyX packages to require outside libraries and tools. The question would be whether it's reasonable or desirable to relegate ClusterSearch to non-core. > Further investigation seems warranted. It would sure be nice if we could > lower > our costs for developing and maintaining serialization routines. > On Mon, Nov 7, 2011 at 2:39 PM, Nick Wellnhofer <[email protected]> wrote: > MessagePack might be worth a look. See http://msgpack.org/ Yes, that looks good too. I'm suggesting that we restrict ourselves to Protocol Buffers, only that it should be possible to use them for interprocess communication, among other options. A good architecture (in my opinion) would be one that allows the over-the-wire protocol to change without requiring in-depth knowledge of Lucy's internals. I think the key is to have a clear definition of what "information" is required by each layer of Lucy, rather than serializing and deserializing raw objects. > As for ZeroMQ, it's LGPL which pretty much rules it out for us -- nothing > released under the Apache License 2.0 can have a required LGPL dependency. You know these rules better than I do, but I often worry that your interpretations are often stricter than required by Apache's legal counsel. There's room for optional dependencies: http://www.apache.org /legal/resolved.html#optional For example, it looks like Apache Thrift (another alternative protocol to consider) isn't scared of ZeroMQ: https://issues.apache.org/jira/browse/THRIFT-812 >> Regardless of the path we go for building / shipping clustered search >> solution. I'm mostly interested in the api's to the lower level lucy that >> make it possible and how to make them better. > > Well, my main concern, naturally, is the potential burden of exposing > low-level > internals as public APIs, constraining future Lucy core development. It's a good concern, and I'm not certain what Dan is envisioning, but I'm hoping that improving the API's means _less_ exposure of the internals. Rather than passing around Searcher and Index objects everywhere, I'd love to make it explicitly clear what information is available to whom: if a remote client doesn't return it, you can't use it. Instead of increasing exposure for remote clients, we'd simplify the interface to local Searchers. > If we actually had a working networking layer, we'd have a better idea about > what sort of APIs we'd need to expose in order to facilitate alternate > implementations. Rapid-prototyping a networking layer in Perl under LucyX > with > a very conservative API exposure and without hauling in giganto dependencies > might help with that. :) Yes! I don't want to stand in the way of progress. Prototyping something that works is a great idea. I don't have the fear of dependencies that you do, but if you think it's faster to build something simple from the ground up rather than using a complex existing package, have at it! --nate
