Hi, 

I've tried to summarize the discussion so far:

My proposal was to move the tokenized/binary/compressed
bits from *.fdt (field values) to *.fnm (field definitions). That
would make the intent of the code handling field attributes
much clearer and reduce the complexitiy of the code.
(you'll find details in my first posting)

As a tradeoff one would loose the possibility of storing
the tokenized/binary/compressed attributes of a field on
a per-document bases, instead they would be stored as
a global attributes of a field. 

The other consequences of this refactoring would be:

-- binary format of *.fdt will change.
-- simpler code for writing/reading field attributes

The consequences are not:

-- no, you must not know all field definitions at start.
It would be possible to add new fields to documents
at any time.

-- the handling of field norms will not change

There are some proposals which go even further:

1. make field infos file (*fnm) single per-index
2. make filed infos file human readable
3. optimize merging of 1-document segments
(http://issues.apache.org/jira/browse/LUCENE-211)

While 3. is a completely different topic, the first two
may be worth to be discussed. Concerning the second
point I'm personally reluctant because it opens the
discussion of what format to choose and those discussions
end too often in choosing XML which would require
the whole XML-bloat being linked to any C-library
implementing lucene.


Robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to