Take a look at LUCENE-5317 [1] and LUCENE-5318 [2].

They're available on my github site [3], and I've pushed them to maven central 
[4].

LUCENE-5318 is crazily useful as a term/phrase recommender system.

I haven't documented either very well yet.  I'll try to add documentation to my 
github site tomorrow.

Let me know if you have any questions.

Cheers,

           Tim


[1] https://issues.apache.org/jira/browse/LUCENE-5317
[2] https://issues.apache.org/jira/browse/LUCENE-5318
[3] https://github.com/tballison/lucene-addons (both 5317 and 5318 are under 
the "5317 project"
[4] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5317/6.2-0.1 

-----Original Message-----
From: José Tomás Atria [mailto:jtat...@gmail.com] 
Sent: Monday, September 19, 2016 3:32 PM
To: java-user@lucene.apache.org
Subject: Cooccurrence matrices

Hello All,

I'm trying to use Lucene in order to create a sliding window cooccurrence 
matrix. I've found some old discussion threads on this list that provide some 
pointers, but most of those are for really old lucene versions, or rely on 
components that are no longer available.

So far, I tried walking over every document collecting teir term-vectors and 
then counting cooccurrences based on each term-vector's per-document index, but 
this seems a little innefficient to me (not to say that it requires 
termvectors) and I was wondering if anyone here has some other idea of how to 
extract cooccurrence counts from a lucene index.

Just to be clear: what I need is to collect cooccurrence counts for all terms 
within a (possibly asymetric) sliding window around a focal term, for each term 
in an index.

Any ideas would be greatly appreciated. Thanks!
jta
-- 

sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.

Reply via email to