Hi, Le Lundi 31 Juillet 2006 17:28, robert engels a écrit : > Doing this beak compatibility with non-Java Lucene implementations.
For me, a such compatibilty is the file format one. Am I wrong ? In such a case, I don't see any compatibilty break as the default implementation of FieldsDataWriter is a actual one. And if I generate an index with my custom writer, I will expect my index to be uncompatible with other implementation, even with other Java ones. > Not sure it matters, but I thought I would point it out. I have > always thought that Lucene should be compatible at an API level only, > and MAYBE create a network access protocol for queries and updates. I didn't talked about network access... I don't see your point... > > On Jul 31, 2006, at 10:25 AM, Nicolas Lalevée wrote: > > Le Vendredi 21 Juillet 2006 12:37, Marvin Humphrey a écrit : > >> On Jul 21, 2006, at 1:23 AM, Nicolas Lalevée wrote: > >>> In fact, that was my first implementaion. The problem with that is > >>> you can > >>> only store one value. But thinking a little more about it, storing > >>> one or > >>> more value is not an issue, because with the solution I proposed, > >>> no space is > >>> saved at all. > >>> In fact, when I thought about this format of field metadata, I was > >>> thinking > >>> about a way to make the Lucene user specify how to store it in the > >>> Lucene > >>> index format. For instance, the simple one would specify that it's > >>> a pointeur > >>> on some metadata (as you proposed), another one would specify that > >>> there are > >>> two pointeurs (in my use case, one for type, the other one for the > >>> language), > >>> and another one whould specify that it will be store directly as > >>> it is > >>> actually an integer (so no need to make a pointer on integer. But > >>> it was just > >>> a thought, I don't know if it is possible. WDYT ? > >> > >> I'm thinking that there would be a codecs file, say with the > >> extension .cdx and this format: > >> > >> Codecs (.cdx) --> CodecCount, <CodecClassName>CodecCount > >> CodecCount --> Uint32 > >> CodecClassName --> String > >> > >> That file would be read in its entirety when the index was > >> initialized and expanded into an array of codec objects, one per > >> CodecClassName. > >> > >> The .fdx file would add an additional int per doc... > >> > >> FieldIndex (.fdx) --> <FieldValuesPosition, > >> FieldValuesCodecNumber>SegSize > >> FieldValuesPosition --> Uint64 > >> FieldValuesCodecNumber --> Uint32 > >> > >> Now, before you read any data from the .fdt file, you know how to > >> interpret it. You seek the .fdt IndexInput to the right spot, then > >> feed it to the appropriate codec object from the codecs array. The > >> codec does the rest. In your case, you might write a codec that > >> would read a few bytes and strings of metadata up front. Or you > >> might have several different codecs, the identity of which indicates > >> fixed values for certain metadata fields: FrenchDocument, > >> ArabicDocument, etc. > >> > >> Would that scheme meet your needs? > > > > That looks good, but there is one restriction : it have to be per > > document. > > Let's explain a lit bit more my needs. > > > > In fact my app have to index some data which is structured in a RDF > > graph. > > Each rdf resource have a title and a description, each title and > > description > > being in different languages. The model we choose is to map a rdf > > resource on > > a document. Then the field name is the URI of the rdf property, and > > the field > > value is the litteral or other resource. > > for instance : > > doc1 : URI:http://foo.com title:[en]foo title:[fr]truc > > So, in a document I will have several fields with different > > languages. For my > > use case, in fact I need only one "codec". It is a codec that will > > get 3 > > values, 2 of them being optionnal : a language, a type, and a value. > > > > In fact I was thinking about a more generic version that will allow > > the format > > compatibility, keeping .fdx as is : > > > > FieldData (.fdt) --> <DocFieldData>SegSize > > DocFieldData --> FieldCount, <FieldNum, RawData>FieldCount > > > > And a default FieldsDataWriter will be the actual one, it will read > > the > > RawData as Bits, Value, with Value --> String | BinaryValue,.... > > Then, for my app, I will provide some custom FieldsDataWriter that > > will do > > exactly what I want. > > > > What I don't know yet is how it breaks that API... because if I > > want to > > provide my own FieldsDataWriter, I would also want to have my own > > implementation of Fieldable... > > If you think this is a good idea, I will try to implement it. > > > > cheers, > > Nicolas > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]