Ok opened LUCENE-2403. I could make the change to make the two lines consistent but to use a BytesRef directly wouldn't Term.java need to use BytesRef instead of String, or is there a new flex "Term" class that uses a BytesRef to use?
Otherwise, TermInfo could change to use the name of the field and a BytesRef instead of a term. Tom -----Original Message----- From: Michael McCandless [mailto:[email protected]] Sent: Saturday, April 17, 2010 11:43 AM To: [email protected] Subject: Re: Fix to contrib/misc/HighFreqTerms.java Ahh you're right! Though, really, we should not be converting to String (flex terms in general are an arbitrary byte[], not necessarily utf8). We should just use a BytesRef directly in the key. Can you open an issue for this Tom? Thanks! Mike On Fri, Apr 16, 2010 at 2:41 PM, Burton-West, Tom <[email protected]> wrote: > Hi Mike, > > Thanks for making the fix and changing the display from bytes to utf8. It > needs a very minor change: > The latest fix converts to utf8 if you give a field argument on the command > line but still shows bytes if you don't. > > Line 89 should parallel line 70 and use term.utf8ToString() instead of > term.toString; > > 70 tiq.insertWithOverflow(new TermInfo(new Term(field, > term.utf8ToString()), termsEnum.docFreq())); > 89 tiq.insertWithOverflow(new TermInfo(new Term(field, > term.toString()), terms.docFreq())); > > Tom > > -----Original Message----- > From: Michael McCandless [mailto:[email protected]] > Sent: Wednesday, April 14, 2010 3:50 PM > To: [email protected] > Subject: Re: Bug in contrib/misc/HighFreqTerms.java? > > OK I committed the fix. I ran it on a flex wikipedia index I had... > it produces output like this: > > body:[3c 21 2d 2d] 509050 > body:[73 68 6f 75 6c 64] 515495 > body:[74 68 65 6e] 525176 > body:[74 69 74 6c 65] 525361 > body:[5b 5b 55 6e 69 74 65 64] 532586 > body:[6b 6e 6f 77 6e] 533558 > body:[75 6e 64 65 72] 536480 > body:[55 6e 69 74 65 64] 543746 > > Which is not very readable, but, it does this because flex terms are > arbitrary byte[], not necessarily utf8... maybe we should fix it to > print both hex and String if we assume bytes are utf8? > > Mike > > On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless > <[email protected]> wrote: >> Ugh, I'll fix this. >> >> With the new flex API, you can't ask a composite (Multi/DirReader) for >> its postings -- you have to go through the static methods on >> MultiFields. I'm trying to put some distance b/w IndexReader and >> composite readers... because I'd like to eventually deprecate them. >> Ie, the composite readers should "hold" an ordered collection of >> sub-readers, but should not themselves implement IndexReader's API, I >> think. >> >> Thanks for raising this Tom, >> >> Mike >> >> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <[email protected]> wrote: >>> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the >>> the exception appended below. I believe the line of code involved is a >>> result of the flex indexing merge. Should I post this as a comment to >>> LUCENE-2370 (Reintegrate flex branch into trunk)? >>> >>> Or is there simply something wrong with my configuration? >>> >>> Exception in thread "main" java.lang.UnsupportedOperationException: please >>> use MultiFields.getFields if you really need a top level Fields (NOTE that >>> it's usually better to work per segment instead) >>> at >>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762) >>> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71) >>> >>> Tom Burton-West >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
