[ https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1340: --------------------------------------- Attachment: LUCENE-1340.patch I attached a new rev of the patch: * Use less RAM if field omits tf's (don't write the tf's into the RAM buffer), so we flush less often * Added another test case to TestOmitTf As a test, I indexed full wikipedia (~3.2 million docs) with this alg: {code} analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker docs.file=/Volumes/External/lucene/wiki.txt doc.stored = false doc.term.vector = false doc.add.log.step=10000 max.field.length=2147483647 directory=FSDirectory autocommit=false compound=false doc.maker.forever = false work.dir=/lucene/work2 ram.flush.mb=64 - CreateIndex { "AddDocs" AddDoc > : * - CloseIndex RepSumByPrefRound AddDoc {code} With tf's it takes 970 seconds and index size is 2.5 GB. Without tf's it takes 834 seconds (14% faster) and index size is 1.1 GB (56% smaller). > Make it posible not to include TF information in index > ------------------------------------------------------ > > Key: LUCENE-1340 > URL: https://issues.apache.org/jira/browse/LUCENE-1340 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Reporter: Eks Dev > Priority: Minor > Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, > LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Term Frequency is typically not needed for all fields, some CPU (reading one > VInt less and one X>>>1...) and IO can be spared by making pure boolen fields > possible in Lucene. This topic has already been discussed and accepted as a > part of Flexible Indexing... This issue tries to push things a bit faster > forward as I have some concrete customer demands. > benefits can be expected for fields that are typical candidates for Filters, > enumerations, user rights, IDs or very short "texts", phone numbers, zip > codes, names... > Status: just passed standard test (compatibility), commited for early review, > I have not tried new feature, missing some asserts and one two unit tests > Complexity: simpler than expected > can be used via omitTf() (who used omitNorms() will know where to find it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]