On 7/10/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Chuck Williams wrote:
> Lucene today allows many field properties to vary at the Field level.
> E.g., the same field name might be tokenized in one Field on a Document
> while it is untokenized in another Field on the same or different
> Document.

The rationale for this design was to keep the API simple.  I think of it
like variable declarations: some languages require them and some don't.
  I opted to make Lucene fields like dynamically-typed variables.  In
part, Lucene's popularity is due to the simplicity of its API.

It's just now struck me the irony that most people are happy with the
"dynamically-typed" fields in Java (Lucene) but they didn't go down as
well in Ruby (Ferret).

However, in my uses of Lucene, most documents have the same fields used
in the same way, so I don't think I've ever actually taken much
advantage of this functionality.  It is nice to be able to add a field
to an index by changing the indexing code in a single place, where the
field's value is created, and not having to also change the index
initialization code.  We should try to keep such redundancies out of
user code.

Thus I would encourage any change in this direction to continue to
permit fields to be defined lazily, the first time they are added,
rather than requiring all fields to be declared up front.  Are there
substantial optimizations that are only possible if all fields are known
when the index is initialized?

I don't think declaring all fields up front is necessary for
substantial optimizations. I've found that the key to some really good
optimizations is having constant field numbers. That is, once a field
is added to the index it is assigned a field number and it it keeps
that field number for the life of the index. This allows one
FieldInfos object per index instead of one per segment. As I mentioned
earlier this greatly optimizes the merging of term vectors and stored
fields. The only problem I could find with this solution is that
fields are no longer in alphabetical order in the term dictionary but
I couldn't think of a use-case where this is necessary although I'm
sure there probably is one.

Anyway, hopefully we'll be able to lead the way with some brilliant
new ideas in the Lucy project. Put our money where our mouth is, so to
speak. If only I had a little more time right now.

Cheers,
Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to