On Sat, May 06, 2006 at 01:16:43AM +0000, George Washington wrote: > I am using Lucene to index as well as to store complete source documents > (typically few tens of thousands of documents, not millions). I would like > to protect the source documents with encryption but have the following > questions:
It's hard to suggest something without knowing exactly what your scenario is, and against what threat you're trying to protect yourself using encryption. Without knowing that, we can't really suggest anything. From whom exactly are you trying to "protect the source documents"? For example, people have already told you that you can reconstruct an approximation of the source documents from the index if you have position information. Then an attacker could at least find out all the terms and their sequence for each document. You could try to guard against that by hashing the terms. But then, things like wildcard queries or range queries would not work. And a dedicated attacker could still run a dictionary attack or a brute force attack to get the index terms corresponding to each hash term. You could store the encrypted and base64 encoded source documents in a stored, but unindexed field. (NB: Is base64 really necessary here?) But where would the key for decryption come from? Would it be the same for every document, or different ones? If you just encrypt the whole index, it becomes just another file encryption problem, and you can use anything you like (PGP, SSL, some encrypted loopback stuff which works on Linux, a Mac OS encrypted disk image, etc.) But then this question has nothing to do with Lucene. So what exactly are you trying to accomplish? What are you trying to guard against? Regards, Sebastian -- Sebastian Kirsch <[EMAIL PROTECTED]> [http://www.sebastian-kirsch.org/] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]