If you are using lucene 4.0 and afford to compress your document dataset while 
indexing, it will be a huge savings in terms of disk space and also in IO 
(resulting in indexing throughput).

In our case, it has helped us a lot as compressed data size was roughly 3 times 
less than  of original document data set size.

You may want to check  the below  link.

http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

Regards,
Rahul


-----Original Message-----
From: Ramprakash Ramamoorthy [mailto:youngestachie...@gmail.com]
Sent: 07 December 2012 13:03
To: java-user@lucene.apache.org
Subject: Separating the document dataset and the index dataset

Greetings,

         We are using lucene in our log analysis tool. We get data around 35Gb 
a day and we have this practice of zipping week old indices and then unzip when 
need arises.

           Though the compression offers a huge saving with respect to disk 
space, the decompression becomes an overhead. At times it takes around 10 
minutes (de-compression takes 95% of the time) to search across a month long 
set of logs. We need to unzip fully atleast to get the total count from the 
index.

           My question is, we are setting Index.Store to true. Is there a way 
where we can split the index dataset and the document dataset. In my 
understanding, if at all separation is possible, the document dataset can alone 
be zipped leaving the index dataset on disk? Will it be tangible to do this? 
Any pointers?

           Or is adding more disks the only solution? Thanks in advance!

--
With Thanks and Regards,
Ramprakash Ramamoorthy,
+91 9626975420
This email and any attachments are confidential, and may be legally privileged 
and protected by copyright. If you are not the intended recipient dissemination 
or copying of this email is prohibited. If you have received this in error, 
please notify the sender by replying by email and then delete the email 
completely from your system. Any views or opinions are solely those of the 
sender. This communication is not intended to form a binding contract unless 
expressly indicated to the contrary and properly authorised. Any actions taken 
on the basis of this email are at the recipient's own risk.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to