Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

John Wang Thu, 04 Dec 2008 16:19:02 -0800

Hi Grant:
     I agree and I apologize for hijacking this thread. If Luceners feel our
criticisms are invalid, then so be it.


     We should focus on this issue, being the serialization story in Lucene.
Not general java serialization, so I don't see how it would benefit to move
this to the java dev list.

      As far as lucene serialization, incorporating comments from various
people, this is what I gather are the choices (feel free to correct me)

1) Remove implementation and support of Serializable: We all agreed this is
bad and breaks backward compatibility.

2) Do nothing to the code base and fix documentation, and clarify Lucene
only supports Serialization between components with the release jar. This
seems to be the suggested approach where I have a coupla concerns:

a) Since given the exact code base, due to the nature of java serialization,
different builds of the jar via IBM vm vs. Sun VM vs. Jrocket etc, cannot
guarantee compatibility. Thus we are enforcing users that care about
Serialization to use the release jar.

b) There is at least one place, as I have previously mentioned, e.g.
ScoreDocComparator, the contract returns a Comparable and via javadoc, must
be serializable. How should this be treated? This can be an application
object, should we pass on the same enforcement there when merge/sort is
happening across the wire since similar serialization problem would break
inside MultiSearcher?

3) Clean up the serialization story, either add SUID or implement
Externalizable for some classes within Lucene that implements Serializable:

>From what I am told, this is too much work for the committers.

I hope you guys at least agree with me with the way it is currently, the
serialization story is broken, whether in documentation or in code. I see
the disagreement being its severity, and whether it is a trivial fix, which
I have learned it is not really my place to say.

Please do understand this is not a far-fetched, made-up use-case, we are
running into this in production, and we are developing in accordance to
lucene documentation.

Thanks

-John

On Thu, Dec 4, 2008 at 3:23 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

>
> On Dec 4, 2008, at 2:21 PM, Jason Rutherglen wrote:
>
>  To put things in perspective, I believe Microsoft (who could potentially
>> place a lot of resources towards Lucene) now uses Lucene through Powerset?
>> and I don't think those folks are contributing back.  I know of several
>> other companies who do the same, and many potential contributions that are
>> not submitted because people and their companies do not see the benefit of
>> going through the hoops required to get patches committed.  A relatively
>> simple patch such as 1473 Serialization represents this well.
>>
>
> What do you suggest?  We didn't force anyone to use Lucene.  Heck, most of
> our users don't even ever participate on the mailing list.
>
> We do provide a very clear, transparent path for making contributions and
> becoming a committer.  I don't know what else we can do, but we're totally
> open to suggestions on how to improve it.
>
> FWIW, just b/c you think 1473 is trivial doesn't make it so.  You have a
> single use case and that's all you care about.  The community has dozens, if
> not hundreds of use cases, and your "trivial" patch may not be so trivial in
> that regards.  How would you feel if we "broke" something that you have
> relied on for years in the name of us moving faster?  I am willing to bet
> the large number of people here in Lucene appreciate our deliberations for
> the most part.  As for my opinion on 1473, I personally think there are
> better ways of achieving what you are trying to do, as Robert and others
> have suggested and I don't think it is worth it to maintain serialization
> across versions as it is a too large of a burden, IMO.  But, heh, make an
> argument (preferably w/o the accusations) and convince me otherwise.
>
>
>>
>> For example if a company is developing custom search algorithms, Lucene
>> supports TF/IDF but not much else.  Custom search algorithms require
>> rewriting lots of Lucene code.  Companies who write new search algorithms do
>> not necessarily want to rewrite Lucene as well to make it pluggable for new
>> scoring as it is out of scope, they will simply branch the code.  It does
>> not help that the core APIs underneath IndexReader are protected and package
>> protected which assumes a user that is not advanced.  It is repeated in the
>> mailing lists that new features will threaten the existing user base which
>> is based on opinion rather than fact.  More advanced users are currently
>> hindered by the conservatism of the project and so naturally have stopped
>> trying to submit changes that alter the core non-public code.
>>
>
> So, your mad at us for others not contributing back their forks?  Even the
> ones we don't know about?  Simply put, I'm sorry we can't please you.  If
> you go read the archives, you will see plenty of times when even us
> committers have been frustrated from time to time by the process (just look
> at the JDK 1.5 debate, or the Interface/Abstract debate) but in the end, I
> feel Lucene is stronger for it.  Community over code, it's the Apache Way.
>  You are free to disagree.  In fact, you have several options available to
> you to show that disagreement:  1. You can work to become a committer and
> change it from within.  The bar really isn't that high, 3 to 4 non-trivial
> patches and a willingness to work with others in a mostly pleasant way.  2.
>  You can make us aware of the patches and be persistent about seeing it
> through and we'll try to get to it.  Just look at CHANGES.txt and JIRA and
> you will see that this happens all the time and from a wide variety of
> contributors (including both you and John).  3.  You can fork the code and
> go do your thing and build your own community, etc.
>
> Personally, I hope you choose 1 or 2, as we're all stronger together than
> we are apart.
>
>
>>
>> The rancor is from users would benefit from a faster pace and the ability
>> to be more creative inside the core Lucene system.  As the internals change
>> frequently and unnannounced the process of developing core patches is
>> difficult and frustrating.
>>
>
> I'm sorry that we can't work at a faster pace.  Suggestions on how to deal
> with the number of patches we have and still maintain quality and how to
> move forward w/o breaking old patches are much appreciated.
>
> As for the internals changing, you have just hit the nail on the head as to
> why it is so important to maintain back-compat.
>
> I simply don't get the unannounced part.  What isn't announced?  Geez, I've
> been a committer for a few years now, and I have yet to see another open
> source project that is as public as Lucene, for better or worse.  Look at
> the archives, we regularly even put our warts out for public consumption in
> an effort to improve ourselves.
>
> Rather than continue hijacking this thread, why don't we either let it die
> and focus on serialization, or we go over to java-dev and you and John and
> the rest of us can create a concrete list of suggestions that we think could
> make Lucene better and we can all discuss them in a positive manner and see
> how we can go about addressing them.  I'd be more than happy to discuss
> there if you want.
>
> Cheers,
>
> Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Reply via email to