Hi,
I am looking for a way to represent term frequency data in a vector
space, thus using unique integer identifiers instead of string. This
would allow feeding tools like LIBSVM from a Lucene index.
A small example: TermFreqVector.toString() produces "{TITLE: one/3,
two/4}". What I am looking for is "1:3 2:4", where 1 and 2 are arbitrary
identifiers, sortedness is not an issue.
The task can obviously be solved using some java Map, but it should be
less efficient then using native Lucene methods.
I am using 2.9.1, my index can be considered constant.
Thanks,
Illes Solt
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org