HighFreqTerms.java

Michael McCandless Sat, 17 Apr 2010 08:43:48 -0700

Ahh you're right!

Though, really, we should not be converting to String (flex terms in
general are an arbitrary byte[], not necessarily utf8).  We should
just use a BytesRef directly in the key.


Can you open an issue for this Tom?  Thanks!

Mike

On Fri, Apr 16, 2010 at 2:41 PM, Burton-West, Tom <[email protected]> wrote:
> Hi Mike,
>
> Thanks for making the fix and changing the display from bytes to utf8.  It 
> needs a very minor change:
> The latest fix converts to utf8 if you give a field argument on the command 
> line but still shows bytes if you don't.
>
> Line 89 should parallel line 70 and use term.utf8ToString() instead of 
> term.toString;
>
> 70       tiq.insertWithOverflow(new TermInfo(new Term(field, 
> term.utf8ToString()), termsEnum.docFreq()));
> 89       tiq.insertWithOverflow(new TermInfo(new Term(field, 
> term.toString()), terms.docFreq()));
>
> Tom
>
> -----Original Message-----
> From: Michael McCandless [mailto:[email protected]]
> Sent: Wednesday, April 14, 2010 3:50 PM
> To: [email protected]
> Subject: Re: Bug in contrib/misc/HighFreqTerms.java?
>
> OK I committed the fix.  I ran it on a flex wikipedia index I had...
> it produces output like this:
>
> body:[3c 21 2d 2d] 509050
> body:[73 68 6f 75 6c 64] 515495
> body:[74 68 65 6e] 525176
> body:[74 69 74 6c 65] 525361
> body:[5b 5b 55 6e 69 74 65 64] 532586
> body:[6b 6e 6f 77 6e] 533558
> body:[75 6e 64 65 72] 536480
> body:[55 6e 69 74 65 64] 543746
>
> Which is not very readable, but, it does this because flex terms are
> arbitrary byte[], not necessarily utf8... maybe we should fix it to
> print both hex and String if we assume bytes are utf8?
>
> Mike
>
> On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless
> <[email protected]> wrote:
>> Ugh, I'll fix this.
>>
>> With the new flex API, you can't ask a composite (Multi/DirReader) for
>> its postings -- you have to go through the static methods on
>> MultiFields.  I'm trying to put some distance b/w IndexReader and
>> composite readers... because I'd like to eventually deprecate them.
>> Ie, the composite readers should "hold" an ordered collection of
>> sub-readers, but should not themselves implement IndexReader's API, I
>> think.
>>
>> Thanks for raising this Tom,
>>
>> Mike
>>
>> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <[email protected]> wrote:
>>> When I try to run HighFreqTerms.java in Lucene Revision: 933722  I get the
>>> the exception appended below.  I believe the line of code involved is a
>>> result of the flex indexing merge. Should I post this as a comment to
>>> LUCENE-2370 (Reintegrate flex branch into trunk)?
>>>
>>> Or is there simply something wrong with my configuration?
>>>
>>> Exception in thread "main" java.lang.UnsupportedOperationException: please
>>> use MultiFields.getFields if you really need a top level Fields (NOTE that
>>> it's usually better to work per segment instead)
>>>         at
>>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
>>>         at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>>>
>>> Tom Burton-West
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Fix to contrib/misc/HighFreqTerms.java

Reply via email to