Take a look at LUCENE-5317 [1] and LUCENE-5318 [2]. They're available on my github site [3], and I've pushed them to maven central [4].
LUCENE-5318 is crazily useful as a term/phrase recommender system. I haven't documented either very well yet. I'll try to add documentation to my github site tomorrow. Let me know if you have any questions. Cheers, Tim [1] https://issues.apache.org/jira/browse/LUCENE-5317 [2] https://issues.apache.org/jira/browse/LUCENE-5318 [3] https://github.com/tballison/lucene-addons (both 5317 and 5318 are under the "5317 project" [4] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5317/6.2-0.1 -----Original Message----- From: José Tomás Atria [mailto:jtat...@gmail.com] Sent: Monday, September 19, 2016 3:32 PM To: java-user@lucene.apache.org Subject: Cooccurrence matrices Hello All, I'm trying to use Lucene in order to create a sliding window cooccurrence matrix. I've found some old discussion threads on this list that provide some pointers, but most of those are for really old lucene versions, or rely on components that are no longer available. So far, I tried walking over every document collecting teir term-vectors and then counting cooccurrences based on each term-vector's per-document index, but this seems a little innefficient to me (not to say that it requires termvectors) and I was wondering if anyone here has some other idea of how to extract cooccurrence counts from a lucene index. Just to be clear: what I need is to collect cooccurrence counts for all terms within a (possibly asymetric) sliding window around a focal term, for each term in an index. Any ideas would be greatly appreciated. Thanks! jta -- sent from a phone. please excuse terseness and tpyos. enviado desde un teléfono. por favor disculpe la parquedad y los erroers.