[ http://issues.apache.org/jira/browse/LUCENE-662?page=all ]
Nicolas Lalevée updated LUCENE-662: ----------------------------------- Attachment: generic-fieldIO-2.patch I think I got it. What was disturbing on the last patch was the notion of FieldData I added. So I removed it. So let's summerize the diff between the trunk and my patch : * The concepts : ** an IndexFormat defines which FieldsWriter and FieldsReader to use ** an IndexFormat defines the used extensions, so the user can add it's own files ** the format of an index is attached to the Directory ** the whole index format isn't customizable, just a part of them. So some functions are private or "default", so the Lucene user won't have acess to them : it's Lucene internal stuff. Some others are public or protected : they can be redefined. ** Lucene now provide an API to add some files which are tables of data, as the FieldInfos is ** it is to the FieldsWriter implementation to check if the field to write is of the same format (basically checking by a instanceof). ** the user can add some information at the document level, and provide it's own implementation of Document ** the user can define how data for a field is stored and retreived, and provide it's own implementation of Fieldable ** the reading of field data is done in the Fieldable ** the writting of the field is done in the FieldsWriter * API change : ** There are new constructors of the directory : contructors with specified IndexFormat ** new Entry and EntryTable : generic API for managing a table of data in a file ** FieldInfos extends now EntryTable * Code changes : ** AbstractField become Fieldable (Fieldable is no more an interface). ** the FieldsWriter have been separated in the abstract class FieldsWriter and its default implementation DefaultFieldsWriter. Idem for FieldsReader and DefaultFieldsReader. ** the lazy loading have been moved from FieldsReader to Fieldable ** IndexOuput can now write directly from an input stream ** If a field was loaded lazily, the DefaultFieldsWriter directly copy the source input stream to the output stream ** the IndexFileNameFilter take now it's list of known file extensions from the index format ** each time a temporary RAM directory is created, the index format have to be passed : see diff for CompoundFileReader or IndexWriter ** Some private and/or final have been moved to public * Last worries : ** quite a big one in fact, but I don't know how to handle it : every RMI tests fails because of : {noformat} error unmarshalling return; nested exception is: [junit] java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor [junit] java.rmi.UnmarshalException: error unmarshalling return; nested exception is: [junit] java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor [junit] at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:157) {noformat} ** a function is public and it shouldn't : see Fieldable.setLazyData() I have added an exemple of implementation in the patch that use this future : look at org.apache.lucene.index.rdf I know this is a big patch but I think the API has not been broken, and I would appreciate comments on this. > Extendable writer and reader of field data > ------------------------------------------ > > Key: LUCENE-662 > URL: http://issues.apache.org/jira/browse/LUCENE-662 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Reporter: Nicolas Lalevée > Priority: Minor > Attachments: generic-fieldIO-2.patch, generic-fieldIO.patch > > > As discussed on the dev mailing list, I have modified Lucene to allow to > define how the data of a field is writen and read in the index. > Basically, I have introduced the notion of IndexFormat. It is in fact a > factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter > and the SegmentMerger are using this factory and not doing a "new > FieldsReader/Writer()". > I have also introduced the notion of FieldData. It handles every data of a > field, and also the writing and the reading in a stream. I have done this way > because in the current design of Lucene, Fiedable is an interface, so methods > with a protected or package visibility cannot be defined. > A FieldsWriter just writes data into a stream via the FieldData of the field. > A FieldsReader instanciates a FieldData depending on the field name. Then it > use the field data to read the stream. And finnaly it instanciates a Field > with the field data. > About compatibility, I think it is kept, as I have writen a > DefaultIndexFormat that provides some DefaultFieldsWriter and > DefaultFieldsReader. These implementations do the exact job that is done > today. > To acheive this modification, some classes and methods had to be moved from > private and/or final to public or protected. > About the lazy fields, I have implemented them in a more general way in the > implementation of the abstract class FieldData, so it will be totally > transparent for the Lucene user that will extends FieldData. The stream is > kept in the fieldData and used as soon as the stringValue (or something else) > is called. Implementing this way allowed me to handle the recently introduced > LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on > this lazy field data, the saved input stream is directly copied in the output > stream. > I have a last issue with this patch. The current design allow to read an > index in an old format, and just do a writer.addIndexes() into a new format. > With the new design, you cannot, because the writer will use the > FieldData.write provided by the reader. > enjoy ! -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]