Hello everyone, I am indexing a collection of XML files. I select a few tags and each selected tag of a XML file is indexed in a different field of a document.
I need to get the document frequency (the number of documents that have the term) of each term. The problem is that I am getting a TermVector for each field. If I sum the document frequency of each term in each field, the documents that have the same term in different fields (tags) will be counted more than once. Is there any (efficient) way to get the document frequency without counting one document more than once? I can't make another field while indexing with the content of every tag I want to index because I use a different set of filters for each tag. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Document-frequency-with-multiple-fields-tp4240785.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org