Document frequency with multiple fields

renanmach Wed, 18 Nov 2015 05:57:39 -0800

Hello everyone,

I am indexing a collection of XML files. I select a few tags and each
selected tag of a XML file is indexed in a different field of a document.


I need to get the document frequency (the number of documents that have the
term) of each term. The problem is that I am getting a TermVector for each
field. If I sum the document frequency of each term in each field, the
documents that have the same term in different fields (tags) will be counted
more than once.

Is there any (efficient) way to get the document frequency without counting
one document more than once? 

I can't make another field while indexing with the content of every tag I
want to index because I use a different set of filters for each tag.

Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-frequency-with-multiple-fields-tp4240785.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Document frequency with multiple fields

Reply via email to