Hi,

I am about neck deep in updating the TermVector code from Dmitry.  I believe I have 
most of it in, with the exception of the SegmentMerge code.  Was wondering if anyone 
could write a little bit on the concepts behind this code?  

Also, in the File Formats section (under limitations), it says the TermCount (the 
number of terms that can be indexed) is currently a 32 bit, but the code is moving 
towards 64 bit.  What part, if any, has been moved?  I was looking in SegmentTermEnum 
and the position value in there is currently a long, but the only place it gets 
assigned to (other than where it is incremented in next()) is assigning an int in the 
seek() method.
In TermInfosReader, there are some things that refer to position by longs, while 
others refer by ints.

In Dmitry's code, he maps Terms to Term Numbers by using the position of the term, but 
this really won't work when moving to 64 bit fields (since the term numbers are stored 
in an array, which is only 32 bit addressable).  

Would it be acceptable to put the postion value back to being an int until we are 
ready to address the complete issue of 64 bit storage as a whole?  Or am I missing 
something about the usage of position?  Changing it back, I have a compilable version 
for 1.3, and in a  few days, should have a tested version (I am also writing many new 
Unit tests) that I can submit for review.

Any insight is appreciated.

Thanks,
Grant Ingersoll


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to