> Saving the index in text format would also be a fun codec (in 4.0) to create > :)
A codec like that would be welcome :) On Wed, Sep 22, 2010 at 5:31 AM, Michael McCandless <luc...@mikemccandless.com> wrote: > Saving the index in text format would also be a fun codec (in 4.0) to create > :) > > Ie, the codec would be read/write. The performance wouldn't be great, > but it'd be neat for debugging, teaching, transparency purposes... > > Mike > > On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog <goks...@gmail.com> wrote: >> The Lucene CheckIndex program opens an index and walks all of the data >> structures. It is a good start for you. >> >> Sahin Buyrukbilen wrote: >>> >>> Thank you Uwe, I will read the docs and try to do it, however do you have >>> an >>> example code? I need because I am not very familiar with Java. >>> >>> Thank you. >>> >>> Sahin >>> >>> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler<u...@thetaphi.de> wrote: >>> >>> >>>> >>>> Hi, >>>> >>>> Retrieve a TermEnum and iterate it. By that you get all terms and can >>>> retrieve the docFreq, which is the second column in your table. Finally >>>> for >>>> each term you position the TermDocs enum on this term to get all document >>>> ids. Read docs of IndexReader/TermEnum/TermDocs about this. >>>> >>>> Uwe >>>> >>>> ----- >>>> Uwe Schindler >>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>> http://www.thetaphi.de >>>> eMail: u...@thetaphi.de >>>> >>>> >>>>> >>>>> -----Original Message----- >>>>> From: Sahin Buyrukbilen [mailto:sahin.buyrukbi...@gmail.com] >>>>> Sent: Tuesday, September 21, 2010 9:12 AM >>>>> To: java-user@lucene.apache.org >>>>> Subject: How to export lucene index to a simple text file? >>>>> >>>>> Hi, >>>>> >>>>> I am currently working on a project about private information retrieval >>>>> >>>> >>>> and I >>>> >>>>> >>>>> need to have an inverted index file in txt format as follows: >>>>> >>>>> Term t freq t Inverted list for t >>>>> >>>>> ------------------------------------------------------------------------- >>>>> and 1<6, 0.159> >>>>> big 2<2, 0.148> <3, 0.088> >>>>> dark 1<6, 0.079> >>>>> . >>>>> . >>>>> . >>>>> . >>>>> >>>>> here the<number1, number2> pairs are indicating: number1: doc ID, where >>>>> term t exist with a rank of number2. >>>>> >>>>> I have created an index from 5492 txt files, however the index is >>>>> >>>> >>>> composed >>>> of >>>> >>>>> >>>>> different files and most of the data is not in the text format. >>>>> >>>>> could somebody guide me to achieve this? >>>>> >>>>> Thank you >>>>> >>>>> Sahin. >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org