Hello, I have a question about idf computation for different fields: As we know, idf = Math.log(numDocs/(docFreq+1)) + 1.0 docFreq is field specific, however, numDocs is a shared number for all fields.
for example: Assume there are 1M docs, mean numDocs=10^6 all of the docs have field_1, but only 10,000 have non-empty field_2, thus for a word, maybe docFreq(field_1)=1000 while docFreq(field_2)=10, then idf of field_2 will be much higher than field_1: idf(field_1) = ln(10^6/(1000+1))+1 idf(field_2) = ln(10^6/(10+1))+1 Then if I want to use a DisjunctionMaxQuery(field_1,field_2), the score is unfair. And setBoost is not a ideal method to adjust this score. Any suggestion on this? And actually I think the result is somehow unreasonable, do you think it worth a jira ticket to replace the total numDocs with non-empty docs num on special field in the idf expression? Thanks! -- Regards, Boyan