DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=34629>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=34629 ------- Additional Comments From [EMAIL PROTECTED] 2005-04-28 16:04 ------- (In reply to comment #2) > Nicolas, thanks for the contribution! I took a quick look at the ZIP file. > Would it be possible for you to describe (here and/or in the Javadocs) how > these > 12+ classes work to provide Document update functionality? The goal of this contribution is to overwrite only the files containing information about the term posting list ( .tis , .tii, .frq, etc..). In the Lucene API, the term posting lists are accessible with IndexReader.Terms() (Enumerate all the terms) and IndexReader.TermPositions() (For a specific term, enumerate each pair <doc number, Freq, <position>^freq > ) methods. So, if i modified the output of this 2 methods (add new terms, delete relations between document and terms, etc..) and rewrite the output in the lucene index, I recreate a new lucene term posting list. That's what this contribution does ! To do this, i create a interface called TermProducter containing this 2 methods (Terms() and TermPositions()).A class implementing this interface have to produce this 2 kind of ouputs (so it produce the posting lists). For Exemple a IndexReader could implements this interface, but you can also create your own term posting list producter, or create a TermProducter that modify the content of the original IndexReader ouput. Then, with the TermWriter class that takes in input a TermProducter and a lucene index, you can rewrite the lucene term posting list with the content of the TermProducter. So now the question is : How can i modified the term posting list ? , What are my tools ? You have 2 types of Tools : TermGenerator and TermTransformer. * The TermGenerator Interface. It generates a TermProducter instance. Its goal is to create a new posting list. The interface is simple: public TermProducter CreateTermProducter(); There are 2 proposed Implementations: - TermReader . A IndexReader Wrapper implementing TermProducter - TermAdder . you can create your own posting list by adding term/documen relation. It's like a virtual index. * The TermTransformer Interface. It modifies the output of a TermProducter. The interface is: public TermProducter transform(TermProducter producter); There are 2 proposed Implementations: - TermFilter. Filter some term/doc relations - TermReplacer. You can replace some term/doc relations by others relations * You have also a special TermProducter implementation called TermMerger. It merges several TermProducter. (useful ) void add(TermProducter producter ) terms() termPositions(); Now we can play by combining and create a kind of pipeline. For exemple, a update process : (1) TermReader----> (2) TermFilter ----> (4)TermMeger (-----> (5) TermWriter ) | (3) TermAdder --->-----+ 1 - we read the lucene posting list 2- we delete somes terms 3 - wa add new term 4- we merge the 2 TermProducters to create the final TermProducter 5- we write the termproducter informations in the lucene index. This design allows flexibility because If i just want replace terms i can use this simple/optimized process: (1) TermReader----> (2) TermReplacer (---->TermWriter ) So you can create your own pipeline of terms transformation. --- A COMPLET EXEMPLE --- Use case: i have to delete a term in several documents. 1 - I have to know all the lucene document numbers. The main class is the IndexUpdater. It contains a TermWriter and allow to select the desired doc. So i must create a instance. IndexUpdater updater = IndexUpdater(IndexReader reader); After this, i can execute a lucene query to select all the desired documents, to DocumentSelection docsel=updater.selectDoc(Query query); Ok now i have a DocumentSelection instance allowing to a TermGenerator/TermTransformer to know which document is selected or not to delete the terms. 2 - delete their relations with the desired terms. So now I create a TermFilter and delete the term in the selected document. filter=new TermFilter(); filter.deleteTerm(new Term("field","deletedvalue"), docsel); 3- now i create a pipeline like this: TermReader----> TermFilter ( ---->TermWriter ) We have a method in the IndexUpdater to create a TermReader of the lucene index. TermReader reader= updater.getTermReader(); TermProducter finalProducter=filter.transform(reader.createTermProducter()); updater.setTermProducter(finalProducter); 4- I close and so write in the index the new posting lists. updater.close(); Ok , is it clear ? PS: 1 - sorry for english, 2 - I know this contribution is not perfect (name of classes, design, implementation) and can be certainly fixed but it's a first step to a easy update of the postings lists, a lack in Lucene. -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]