I'm trying to use Lucene in order to create a sliding window cooccurrence
matrix. I've found some old discussion threads on this list that provide
some pointers, but most of those are for really old lucene versions, or
rely on components that are no longer available.
So far, I tried walking over every document collecting teir term-vectors
and then counting cooccurrences based on each term-vector's per-document
index, but this seems a little innefficient to me (not to say that it
requires termvectors) and I was wondering if anyone here has some other
idea of how to extract cooccurrence counts from a lucene index.
Just to be clear: what I need is to collect cooccurrence counts for all
terms within a (possibly asymetric) sliding window around a focal term, for
each term in an index.
Any ideas would be greatly appreciated. Thanks!
sent from a phone. please excuse terseness and tpyos.
enviado desde un teléfono. por favor disculpe la parquedad y los erroers.