Answering Grant Ingersoll's question for use case as well, which may clarify.
Without revealing TOO much about our internal structure, we are in the process of replacing SOAP communications in house with Protocol Buffers. We did evaluate Thrift as well, but decided on Protocol Buffers. A large effort for that conversion is well under way. I've been asked if Solr can support this, and to create a prototype to see if there are similar gains. I don't imagine it will be the gains that we've seen over SOAP, but I do foresee some amount of throughput increase. So, in response to suggestion for other binary formatting technologies, my hands are tied. This is the prototype I have to work on for now. If it works out, I will gladly share it. If not, I will share why, and hopefully save others some time. As for Protocol Buffers not supporting the NamedList structure. Google's documentation strongly suggests that intermediate (bean) classes be created, instead of trying to marshall and de-marshall your object model directly. This intermediate model doesn't have to precisely mirror the NamedList, it can be *any* compromise that gets the data from A to B, as long as the NamedList can be reconstituted on the other side. I'm sure something can be done. Thanks, Todd Feak -----Original Message----- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Thursday, October 16, 2008 8:17 AM To: [email protected] Subject: Re: Offer to submit some custom enhancements Hi Todd, AFAIK, protocol buffers cannot be used for Solr because it is unable to support the NamedList structure that all Solr components use. The binary protocol (NamedListCodec) that SolrJ uses to communicate with Solr server is extremely optimized for our response format. However it is Java only. There are other projects such as Apache Thrift ( http://incubator.apache.org/thrift/) and Etch (both in incubation) which can be looked at. There are a few issues in Thrift which may help us in the future: https://issues.apache.org/jira/browse/THRIFT-110 https://issues.apache.org/jira/browse/THRIFT-122 On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd <[EMAIL PROTECTED]>wrote: > Reposting, as I inadvertently thread hijacked on the first one. My bad. > > Hi all, > > I have a handful of custom classes that we've created for our purposes > here. I'd like to share them if you think they have value for the rest > of the community, but I wanted to check here before creating JIRA > tickets and patches. > > Here's what I have: > > 1. DoubleMetaphoneFilter and Factory. This replaces usage of the > PhoneticFilter and Factory allowing access to set maxCodeLength() on the > DoubleMetaphone encoder and access to the "alternate" encodings that the > encoder provides for some words. > > 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and > Latin alphabet) exist in both a FullWidth and HalfWidth form. This > filter normalizes by switching to the FullWidth form for all the > characters. I have seen at least one JIRA ticket about this issue. This > implementation doesn't rely on Java 1.6. > > 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be > translated to Katakana. This filter normalizes to Katakana so that data > and queries can come in either way and get hits. > > > Also, I have been requested to create a prototype that you may be > interested in. I'm to construct a QueryResponseWriter that returns > documents using Google's Protocol Buffers. This would rely on an > existing patch that exposes the OutputStream, but I would like to start > the work soon. Are there license concerns that would block sharing this > with you? Is there any interest in this? > > Thanks for your consideration, > Todd Feak > -- Regards, Shalin Shekhar Mangar.
