yap, also without frequencies, this should not be all that difficult (imho), especially now when we have DocSetIdIterator as superclass, as a matter of fact you could even today get DocSetIterator from TermDocs or whatever and use it as Filter as a lightweight, in memory solution ... real solution would require something like postings "type flag"
----- Original Message ---- From: robert engels <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, 7 February, 2008 7:43:33 PM Subject: postings without position information ? I think there are many uses of Lucene that would benefit from 'enum' fields, aka categories. When classifying documents, they are often in one or more categories. Lucene could write these posting very efficiently using VINT and RLE (run length encoding) if the positions information was not stored (since it is not really useful in these typical cases). StartingDocNum|NumberOfDocuments...StartingDocNum|NumberOfDocuments using a bit of the StartingDocNum to know if it was a series. When a lot of documents are in the same category, and they are added as the same time, the document numbers would be nearly sequential, allowing very efficient compression. Has anyone worked on this? Our previous custom IndexReaderWriter supported it, and I was wondering if this has made it into the core. I checked the docs/email and could not find anything. Thanks. Robert --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________________ Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]