RE: Offer to submit some custom enhancements

Feak, Todd Thu, 16 Oct 2008 08:58:35 -0700

Answering Grant Ingersoll's question for use case as well, which may
clarify.

Without revealing TOO much about our internal structure, we are in the
process of replacing SOAP communications in house with Protocol Buffers.
We did evaluate Thrift as well, but decided on Protocol Buffers. A large
effort for that conversion is well under way. I've been asked if Solr
can support this, and to create a prototype to see if there are similar
gains. I don't imagine it will be the gains that we've seen over SOAP,
but I do foresee some amount of throughput increase.

So, in response to suggestion for other binary formatting technologies,
my hands are tied. This is the prototype I have to work on for now. If
it works out, I will gladly share it. If not, I will share why, and
hopefully save others some time.

As for Protocol Buffers not supporting the NamedList structure. Google's
documentation strongly suggests that intermediate (bean) classes be
created, instead of trying to marshall and de-marshall your object model
directly. This intermediate model doesn't have to precisely mirror the
NamedList, it can be *any* compromise that gets the data from A to B, as
long as the NamedList can be reconstituted on the other side. I'm sure
something can be done.

Thanks,
Todd Feak

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 16, 2008 8:17 AM
To: [email protected]
Subject: Re: Offer to submit some custom enhancements

Hi Todd,

AFAIK, protocol buffers cannot be used for Solr because it is unable to
support the NamedList structure that all Solr components use.

The binary protocol (NamedListCodec) that SolrJ uses to communicate with
Solr server is extremely optimized for our response format. However it
is
Java only.

There are other projects such as Apache Thrift (
http://incubator.apache.org/thrift/) and Etch (both in incubation) which
can
be looked at. There are a few issues in Thrift which may help us in the
future:

https://issues.apache.org/jira/browse/THRIFT-110
https://issues.apache.org/jira/browse/THRIFT-122

On Thu, Oct 16, 2008 at 12:18 AM, Feak, Todd
<[EMAIL PROTECTED]>wrote:

> Reposting, as I inadvertently thread hijacked on the first one. My
bad.
>
> Hi all,
>
> I have a handful of custom classes that we've created for our purposes
> here. I'd like to share them if you think they have value for the rest
> of the community, but I wanted to check here before creating JIRA
> tickets and patches.
>
> Here's what I have:
>
> 1. DoubleMetaphoneFilter and Factory. This replaces usage of the
> PhoneticFilter and Factory allowing access to set maxCodeLength() on
the
> DoubleMetaphone encoder and access to the "alternate" encodings that
the
> encoder provides for some words.
>
> 2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
> Latin alphabet) exist in both a FullWidth and HalfWidth form. This
> filter normalizes by switching to the FullWidth form for all the
> characters. I have seen at least one JIRA ticket about this issue.
This
> implementation doesn't rely on Java 1.6.
>
> 3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
> translated to Katakana. This filter normalizes to Katakana so that
data
> and queries can come in either way and get hits.
>
>
> Also, I have been requested to create a prototype that you may be
> interested in. I'm to construct a QueryResponseWriter that returns
> documents using Google's Protocol Buffers. This would rely on an
> existing patch that exposes the OutputStream, but I would like to
start
> the work soon. Are there license concerns that would block sharing
this
> with you? Is there any interest in this?
>
> Thanks for your consideration,
> Todd Feak
>

-- 
Regards,
Shalin Shekhar Mangar.

RE: Offer to submit some custom enhancements

Reply via email to