I do agree, Hoss, that it makes sense to think about an IndexDocument and a SearchDocument or something along those lines for 3.0.

Also note, I added some comments to 1219 about the history of Fieldable.


On Mar 18, 2008, at 9:26 PM, Chris Hostetter wrote:


: Really, I think we could go just back to a single Field class instead
: of the three classes Fieldable, AbstractField and Field.  If we had
: this then LUCENE-1219 would be easier to cleanly implement.

It's probably worth reviewing the orriginal reasons why Fieldable and
AbstractField were added...

        http://issues.apache.org/jira/browse/LUCENE-545

I'm not intimately familiar with most of this, but at it's core, the
purpose seems primarily related to Fields in *returned* documents after a
search has been performed, particularly relating to lazy loading -- so
that alternate impls could be returned based on FieldSelector options.
I'm not sure how much consideration was given to the impacts on future
changes to the API of Documents/Fields being *indexed*.

somwhere i wrote a nice long diatribe on how in my opinion the biggest
flaw in Lucene's general API was the reuse of "Document" and "Field" for two radically differnet purposes, such that half the methods in each class are meaningless in the other half of the contexts they are used for... i can't find it, but here's a less angry version of the same sentiment, pls
some followup discussion...

http://www.nabble.com/-jira--Created%3A-%28LUCENE-778%29-Allow-overriding-a-Document-to8406796.html
https://issues.apache.org/jira/browse/LUCENE-778

(note that parallel disscussion occured both in email replies and in Jira
comments, they are both worth reading)

All of this is fixable in Lucene 3.0, where we will be free to change the
API; but in the meantime, the fact that 2.3 uses an interface means we
are stuck with supporting it without changing it in 2.4 since right now
clients can implement their own Fieldable impl and then pass it to
Document.add(Fieldable) before indexing the doc.

(things would be a lot easier if the old Document.add(Field) has been left alone and document as being explicitly for *indexing* docs, while a new method was used for Documents being returned by searches ... but that's
water under the bridge)

The best short term approach I can think of for addressing LUCENE-1219
in 2.4:
1) list the new methods in a new interface that extends Fieldable
   (ByteArrayReuseFieldable or something)
2) add the new methods to AbstractField so that it implements
   ByteArrayReuseFieldable
3) put an instanceof check for ByteArrayReuseFieldable in
   DocumentsWriter.

It's not pretty, but it's backwards compatible.


This reminds me of a slightly off topic idea that's been floating arround in the back of my head for a while relating to our backwards compatibility
commitments and the issues of interfaces and abstract classes (which i
haven't though through all the way, but i'm throwing it out there as
long as we're talking about it) ...

Committers tend to prefer abstract classes for extension points because it makes it easier to support backwards compatibility in the cases were we
want to add methods to extendable APIs and the "default" behavior for
these new methods is simple (or obvious delegation to existing methods)
so that people who have writen custom impls can upgrade easily without
needing to invest time in making changes.

But abstract classes can be harder to mock when doing mock testing, and
some developers would prefer interfaces that they can implement with
their existing classes -- i suspect these people who would prefer
interfaces are willing to invest the time to make changes to their impls
when upgrading lucene if the interfaces were to change.

Perhaps the solution is a middle ground: altering our APIs such that all extension points we advertise have both an abstract base class as well as an interface and all methods that take them as arguments use the interface
name. then we relax our backcompat commitments
such that we garuntee between minor releases that the interfaces won't
change unless the corrisponding abstract base class changes to acocunt
for it ... so if customers subclass the base class their code will
continue to work, but if they implement the interface directly ignoring the base class they are on their own to ensure their code compiles against
future minor versions.

Like i said, i haven't thought it through completely, but at first glance
it seems like it would give both commiters and lucene users a lot
of extra flexibility, without sacrificing much in the way of compatibility commitments. they key would be in adopting it rigirously and religiously.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to