> I'm indexing logs from a transaction-based application. > ... > millions documents per month, the size of the indices is ~35 gigs per month > (that's the lower bound). I have no choice but to 'store' each field values > (as well as indexing/tokenizing them) because I'll need to retrieve them in > order to create various reports. Also, I have a backlog of ~2 years of logs > to index! > ... > 1- is there someone out there that already wrote an extension to > Lucene so that 'stored' string for each document/field is in fact stored in > a centralized repository? Meaning, only an 'index' is actually stored in the > document and the real data is put somewhere else.
Do you gain anything from storing the document fields within Lucene? In case not, especially if log files are kept somewhere, you cuold make all 'content' fields unstored (reduce index size), and add a stored non-indexed ID field. It can also be a POINTER field - e.g. <log file name + start offset + length>. At search time, for found documents you can retrieve this ID/POINTER field and then fetch the document from the (original) log file. Makes sense? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]