This is something I had debated when reimplementing Dmitry's change, but decided on the string representation, as this is how they are stored in the index, so calling getTerms() would have to construct the terms anyway and some users may not need them as terms. The original actually used a integer to represent the term in the index, but could only be used in optimized indexes. The strings are stored b/c they are guaranteed to be unique.
In the Term constructor code, the field name is what is being interned, so performance should actually be (slightly) improved over time given the number of fields is usually small. Cheers, Grant >>> [EMAIL PROTECTED] 04/25/04 02:55PM >>> Hi folks, I started to use the new term vector support. Much more efficient than temporarily reindexing documents in a RAMDirectory in order to get their terms :-) However, I think it would be more reasonable if the getTerms() method would return Terms instead of Strings, since this is what at least I need in the subsequent analysis process. Off course it s easy to construct a term given the field and the text. However outside the package only the public constructor of Term can be called, which does the field.intern(). I don t know how expensive the call to intern() really is. Maybe my worries are irrelevant. Christoph --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
