[ http://issues.apache.org/jira/browse/LUCENE-662?page=all ]

Nicolas Lalevée updated LUCENE-662:
-----------------------------------

    Attachment: generic-fieldIO-2.patch

I think I got it. What was disturbing on the last patch was the notion of 
FieldData I added. So I removed it. So let's summerize the diff between the 
trunk and my patch :

* The concepts :
** an IndexFormat defines which FieldsWriter and FieldsReader to use
** an IndexFormat defines the used extensions, so the user can add it's own 
files
** the format of an index is attached to the Directory
** the whole index format isn't customizable, just a part of them. So some 
functions are private or "default", so the Lucene user won't have acess to them 
: it's Lucene internal stuff. Some others are public or protected : they can be 
redefined.
** Lucene now provide an API to add some files which are tables of data, as the 
FieldInfos is
** it is to the FieldsWriter implementation to check if the field to write is 
of the same format (basically checking by a instanceof).
** the user can add some information at the document level, and provide it's 
own implementation of Document
** the user can define how data for a field is stored and retreived, and 
provide it's own implementation of Fieldable
** the reading of field data is done in the Fieldable
** the writting of the field is done in the FieldsWriter

* API change :
** There are new constructors of the directory : contructors with specified 
IndexFormat
** new Entry and EntryTable : generic API for managing a table of data in a file
** FieldInfos extends now EntryTable

* Code changes :
** AbstractField become Fieldable (Fieldable is no more an interface).
** the FieldsWriter have been separated in the abstract class FieldsWriter and 
its default implementation DefaultFieldsWriter. Idem for FieldsReader and 
DefaultFieldsReader.
** the lazy loading have been moved from FieldsReader to Fieldable
** IndexOuput can now write directly from an input stream
** If a field was loaded lazily, the DefaultFieldsWriter directly copy the 
source input stream to the output stream
** the IndexFileNameFilter take now it's list of known file extensions from the 
index format
** each time a temporary RAM directory is created, the index format have to be 
passed : see diff for CompoundFileReader or IndexWriter
** Some private and/or final have been moved to public

* Last worries :
** quite a big one in fact, but I don't know how to handle it : every RMI tests 
fails because of :
{noformat}
error unmarshalling return; nested exception is:
    [junit]     java.io.InvalidClassException: 
org.apache.lucene.document.Field; no valid constructor
    [junit] java.rmi.UnmarshalException: error unmarshalling return; nested 
exception is:
    [junit]     java.io.InvalidClassException: 
org.apache.lucene.document.Field; no valid constructor
    [junit]     at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:157)
{noformat}
** a function is public and it shouldn't : see Fieldable.setLazyData()

I have added an exemple of implementation in the patch that use this future : 
look at org.apache.lucene.index.rdf

I know this is a big patch but I think the API has not been broken, and I would 
appreciate comments on this.

> Extendable writer and reader of field data
> ------------------------------------------
>
>                 Key: LUCENE-662
>                 URL: http://issues.apache.org/jira/browse/LUCENE-662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Nicolas Lalevée
>            Priority: Minor
>         Attachments: generic-fieldIO-2.patch, generic-fieldIO.patch
>
>
> As discussed on the dev mailing list, I have modified Lucene to allow to 
> define how the data of a field is writen and read in the index.
> Basically, I have introduced the notion of IndexFormat. It is in fact a 
> factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter 
> and the SegmentMerger are using this factory and not doing a "new 
> FieldsReader/Writer()".
> I have also introduced the notion of FieldData. It handles every data of a 
> field, and also the writing and the reading in a stream. I have done this way 
> because in the current design of Lucene, Fiedable is an interface, so methods 
> with a protected or package visibility cannot be defined.
> A FieldsWriter just writes data into a stream via the FieldData of the field.
> A FieldsReader instanciates a FieldData depending on the field name. Then it 
> use the field data to read the stream. And finnaly it instanciates a Field 
> with the field data.
> About compatibility, I think it is kept, as I have writen a 
> DefaultIndexFormat that provides some DefaultFieldsWriter and 
> DefaultFieldsReader. These implementations do the exact job that is done 
> today.
> To acheive this modification, some classes and methods had to be moved from 
> private and/or final to public or protected.
> About the lazy fields, I have implemented them in a more general way in the 
> implementation of the abstract class FieldData, so it will be totally 
> transparent for the Lucene user that will extends FieldData. The stream is 
> kept in the fieldData and used as soon as the stringValue (or something else) 
> is called. Implementing this way allowed me to handle the recently introduced 
> LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on 
> this lazy field data, the saved input stream is directly copied in the output 
> stream.
> I have a last issue with this patch. The current design allow to read an 
> index in an old format, and just do a writer.addIndexes() into a new format. 
> With the new design, you cannot, because the writer will use the 
> FieldData.write provided by the reader.
> enjoy !

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to