Hi Mike, Thanks for making the fix and changing the display from bytes to utf8. It needs a very minor change: The latest fix converts to utf8 if you give a field argument on the command line but still shows bytes if you don't.
Line 89 should parallel line 70 and use term.utf8ToString() instead of term.toString; 70 tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq())); 89 tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq())); Tom -----Original Message----- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, April 14, 2010 3:50 PM To: java-dev@lucene.apache.org Subject: Re: Bug in contrib/misc/HighFreqTerms.java? OK I committed the fix. I ran it on a flex wikipedia index I had... it produces output like this: body:[3c 21 2d 2d] 509050 body:[73 68 6f 75 6c 64] 515495 body:[74 68 65 6e] 525176 body:[74 69 74 6c 65] 525361 body:[5b 5b 55 6e 69 74 65 64] 532586 body:[6b 6e 6f 77 6e] 533558 body:[75 6e 64 65 72] 536480 body:[55 6e 69 74 65 64] 543746 Which is not very readable, but, it does this because flex terms are arbitrary byte[], not necessarily utf8... maybe we should fix it to print both hex and String if we assume bytes are utf8? Mike On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > Ugh, I'll fix this. > > With the new flex API, you can't ask a composite (Multi/DirReader) for > its postings -- you have to go through the static methods on > MultiFields. I'm trying to put some distance b/w IndexReader and > composite readers... because I'd like to eventually deprecate them. > Ie, the composite readers should "hold" an ordered collection of > sub-readers, but should not themselves implement IndexReader's API, I > think. > > Thanks for raising this Tom, > > Mike > > On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tburt...@umich.edu> wrote: >> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the >> the exception appended below. I believe the line of code involved is a >> result of the flex indexing merge. Should I post this as a comment to >> LUCENE-2370 (Reintegrate flex branch into trunk)? >> >> Or is there simply something wrong with my configuration? >> >> Exception in thread "main" java.lang.UnsupportedOperationException: please >> use MultiFields.getFields if you really need a top level Fields (NOTE that >> it's usually better to work per segment instead) >> at >> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762) >> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71) >> >> Tom Burton-West >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org