Le Mercredi 05 Juillet 2006 13:23, Michael Busch a écrit : > Doug Cutting wrote: > > Marvin Humphrey wrote: > >> IMO, this should wait. It's going to be freakishly difficult to get > >> this stuff to work and maintain the commitments that Doug has laid > >> out for backwards compatibility. > > > > Perhaps we can implement an all-new index format, in a new package. > > An implementation of IndexReader can be provided to integrate with > > existing search code. And the ability to add an IndexReader to an > > index can be provided to upgrade existing indexes to the new format. > > So the new code would not need to be able to process an old index: the > > old code can continue to do that. Does that make sense? Is that > > "freakishly difficult"? We'll need the ability to sniff a directory > > and tell which version of index it contains, but that should not be > > too hard. > > > > Doug > > +1. I agree that this approach would make it much easier to develop a > new index format without the commitment of being backward-compatible. I > would like to help working on a new index format. Who else is going to > work on it?
I am also interested in improving Lucene too. I took time to respond to this thread because I am quite new to Lucene, so I have to learn what you talked about, in fact what a payload is. But here it is, I get it ! :) What I have to do is a web application which will do some faceted search. My current workaround is transforming each query in several queries, each by categories. So I am interested of your current work. I had also another issue with the field. Some field can have a type (integer, date, string), and/or a language. It is typically some metadata on fields. The quick workaround I did is to put the info in the field between some square brackets. So I had to do a SkipPrefixTokenizer... dirt but almost quick to implement. Then I looked deeper in the Lucene file format, and I manage to introduce some generic field metadata without breaking the file format compatibility. I just used another bit of the "Bits" to mark that there is or not some metadata on the field. And the metadata is stored next to it : DocFieldData --> FieldCount, <FieldNum, Bits, FieldMetadata, Value>^FieldCount FieldMetadata --> ValueSize, <Byte>^ValueSize Does this feature interest the Lucene commiters ? Should I provide a patch in Jira? If not, is there any common place where to provide some patch for some Lucene hackers (ie not necessaraily commiters) ? So, Marvin, could you provide your patch about payload ? And is there a wiki page where there is a starting point about defining the future index format ? cheers, Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
