It seems like there should be a formula for estimating the total
number of unique terms given that you know the unique term counts for
each segment, and make certain assumptions like random document
distribution across segments.

-Yonik
http://www.lucidimagination.com

On Thu, May 27, 2010 at 9:17 PM, kannan chandrasekaran
<ckanna...@yahoo.com> wrote:
> I am just trying out a few experiments to calculate similarity between terms 
> based on their co-occurences in the dataset...  Basically I am trying to 
> build contextual vectors  and calculate similarity using a similarity measure 
> ( say cosine similarity).....
>
> I dont think this is an XY problem . The vectors I am trying to build are not 
> the same as the TermVectors option ((term,freq) pairs per document) in the 
> lucene ( if thats what u meant)
>
> Thanks
> Kannan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to