[ https://issues.apache.org/jira/browse/LUCENE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857121#action_12857121 ]
Michael McCandless commented on LUCENE-2393: -------------------------------------------- Programmatically indexing those docs is fine -- most tests make a MockRAMDir, index a few docs into it, and test against that. This tool looks useful, thanks Tom! Note that with flex scoring (LUCENE-2392) we are planning on storing this statistic (sum of tf for the term across all docs) in the terms dict, for fields that enable statistics. So when that lands, this tool can pull from that, or regenerate it if the field didn't store stats. > Utility to output total term frequency and df from a lucene index > ----------------------------------------------------------------- > > Key: LUCENE-2393 > URL: https://issues.apache.org/jira/browse/LUCENE-2393 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* > Reporter: Tom Burton-West > Priority: Trivial > Attachments: LUCENE-2393.patch > > > This is a command line utility that takes a field name, term, and index > directory and outputs the document frequency for the term and the total > number of occurrences of the term in the index (i.e. the sum of the tf of the > term for each document). It is useful for estimating the size of the term's > entry in the *prx files and consequent Disk I/O demands -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org