Ahh you're right! Though, really, we should not be converting to String (flex terms in general are an arbitrary byte[], not necessarily utf8). We should just use a BytesRef directly in the key.
Can you open an issue for this Tom? Thanks! Mike On Fri, Apr 16, 2010 at 2:41 PM, Burton-West, Tom <tburt...@umich.edu> wrote: > Hi Mike, > > Thanks for making the fix and changing the display from bytes to utf8. It > needs a very minor change: > The latest fix converts to utf8 if you give a field argument on the command > line but still shows bytes if you don't. > > Line 89 should parallel line 70 and use term.utf8ToString() instead of > term.toString; > > 70 tiq.insertWithOverflow(new TermInfo(new Term(field, > term.utf8ToString()), termsEnum.docFreq())); > 89 tiq.insertWithOverflow(new TermInfo(new Term(field, > term.toString()), terms.docFreq())); > > Tom > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Wednesday, April 14, 2010 3:50 PM > To: java-dev@lucene.apache.org > Subject: Re: Bug in contrib/misc/HighFreqTerms.java? > > OK I committed the fix. I ran it on a flex wikipedia index I had... > it produces output like this: > > body:[3c 21 2d 2d] 509050 > body:[73 68 6f 75 6c 64] 515495 > body:[74 68 65 6e] 525176 > body:[74 69 74 6c 65] 525361 > body:[5b 5b 55 6e 69 74 65 64] 532586 > body:[6b 6e 6f 77 6e] 533558 > body:[75 6e 64 65 72] 536480 > body:[55 6e 69 74 65 64] 543746 > > Which is not very readable, but, it does this because flex terms are > arbitrary byte[], not necessarily utf8... maybe we should fix it to > print both hex and String if we assume bytes are utf8? > > Mike > > On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> Ugh, I'll fix this. >> >> With the new flex API, you can't ask a composite (Multi/DirReader) for >> its postings -- you have to go through the static methods on >> MultiFields. I'm trying to put some distance b/w IndexReader and >> composite readers... because I'd like to eventually deprecate them. >> Ie, the composite readers should "hold" an ordered collection of >> sub-readers, but should not themselves implement IndexReader's API, I >> think. >> >> Thanks for raising this Tom, >> >> Mike >> >> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tburt...@umich.edu> wrote: >>> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the >>> the exception appended below. I believe the line of code involved is a >>> result of the flex indexing merge. Should I post this as a comment to >>> LUCENE-2370 (Reintegrate flex branch into trunk)? >>> >>> Or is there simply something wrong with my configuration? >>> >>> Exception in thread "main" java.lang.UnsupportedOperationException: please >>> use MultiFields.getFields if you really need a top level Fields (NOTE that >>> it's usually better to work per segment instead) >>> at >>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762) >>> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71) >>> >>> Tom Burton-West >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org