22 feb 2007 kl. 10.09 skrev Martin Braun:
the only thing I have found in the list before concerning this subject
is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not
sure if
it does the things I want.
I am not sure if we get enough queries for a search over an index base
on the user-queries.
If the content of your corpus is static enough, then time is the
friend that will enable you gather enough user queries to build the
suggestion data set.
Otherwise you have to produce simulated user queries by reducing your
data set to the most common information. Perhaps using Markov chains,
top n paths of terms with Dijkstra or so could be an easy way out.
You can also start looking at the documents people choose to inspect,
and use these as the base for phrase training.
I think you will get further considering this from a behavioral
psychology angle rather than how to access the corpus access
problem. Also, navigating a reduced data set (such as the trie in
LUCENE-625 compared to the corpus it suggests to) will save you a lot
of system resources.
Hope this helps some.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]