This is something I had debated when reimplementing Dmitry's change, but decided on 
the string representation, as this is how they are stored in the index, so calling 
getTerms() would have to construct the terms anyway and some users may not need them 
as terms.  The original actually used a integer to represent the term in the index, 
but could only be used in optimized indexes.   The strings are stored b/c they are 
guaranteed to be unique.

In the Term constructor code, the field name is what is being interned, so performance 
should actually be (slightly) improved over time given the number of fields is usually 
small.

Cheers,
Grant

>>> [EMAIL PROTECTED] 04/25/04 02:55PM >>>
Hi folks,

I started to use the new term vector support. Much more efficient than 
temporarily reindexing documents in a RAMDirectory in order to get their
terms :-)

However, I think it would be more reasonable if the getTerms() method would
return Terms instead of Strings, since this is what at least I need in the
subsequent analysis process. Off course it s easy to construct a term given the
field and the text. However outside the package only the public constructor of 
Term can be called, which does the field.intern(). I don t know how expensive 
the call to intern() really is. Maybe my worries are irrelevant.

Christoph



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to