Saving the index in text format would also be a fun codec (in 4.0) to create :)
Ie, the codec would be read/write. The performance wouldn't be great, but it'd be neat for debugging, teaching, transparency purposes... Mike On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog <goks...@gmail.com> wrote: > The Lucene CheckIndex program opens an index and walks all of the data > structures. It is a good start for you. > > Sahin Buyrukbilen wrote: >> >> Thank you Uwe, I will read the docs and try to do it, however do you have >> an >> example code? I need because I am not very familiar with Java. >> >> Thank you. >> >> Sahin >> >> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler<u...@thetaphi.de> wrote: >> >> >>> >>> Hi, >>> >>> Retrieve a TermEnum and iterate it. By that you get all terms and can >>> retrieve the docFreq, which is the second column in your table. Finally >>> for >>> each term you position the TermDocs enum on this term to get all document >>> ids. Read docs of IndexReader/TermEnum/TermDocs about this. >>> >>> Uwe >>> >>> ----- >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>> >>>> >>>> -----Original Message----- >>>> From: Sahin Buyrukbilen [mailto:sahin.buyrukbi...@gmail.com] >>>> Sent: Tuesday, September 21, 2010 9:12 AM >>>> To: java-user@lucene.apache.org >>>> Subject: How to export lucene index to a simple text file? >>>> >>>> Hi, >>>> >>>> I am currently working on a project about private information retrieval >>>> >>> >>> and I >>> >>>> >>>> need to have an inverted index file in txt format as follows: >>>> >>>> Term t freq t Inverted list for t >>>> >>>> ------------------------------------------------------------------------- >>>> and 1<6, 0.159> >>>> big 2<2, 0.148> <3, 0.088> >>>> dark 1<6, 0.079> >>>> . >>>> . >>>> . >>>> . >>>> >>>> here the<number1, number2> pairs are indicating: number1: doc ID, where >>>> term t exist with a rank of number2. >>>> >>>> I have created an index from 5492 txt files, however the index is >>>> >>> >>> composed >>> of >>> >>>> >>>> different files and most of the data is not in the text format. >>>> >>>> could somebody guide me to achieve this? >>>> >>>> Thank you >>>> >>>> Sahin. >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org