Victor- Your point is well taken that a comprehensive encryption strategy is not quite analogous to compression which is involves more than a transformation of field values to a more compact form since it requires (at a minimum) all data structures which comprise the index be encrypted too. Maybe I spoke to soon.
However, after considering this more, I think the scheme would need to be quite invasive to provide good security. I think just plugging in encryption simplistically would be very vulnerable to side channel attacks. It seems the attacker can get clear text terms encrypted via the particular index's QueryParser implementation and eventually create a fairly complete decryption lookup table using Lucene's data structures, thus undermining the security of the internal data structures (encrypted payloads would potentially be unaffected (unless they corresponded to index Terms)). Let's say this weakness is OK with you. Using the current API, I think you can achieve your ends by using encrypting binary field values and adding a trailing org.apache.lucene.analysis.TokenFilter you use at index and query time that encrypts and Base64 encodes its input (has to be a String). This would effectively give you an encrypted form of Lucene's internal data structures. In addition to my security concerns with the concept, I also still agree with the related philosophical issues put forward to this point on the related field compression topic. It seems inevitable to me that if encryption support were added, eventually, application developers will try to sell Lucene developers on adding features to it in addition to supporting and maintaining it (ala configurable compression quality factor). A configurable, encrypting Base64 TokenFilter would also be a cool contrib. Luke On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:
Thank you Luke for your comments and the references you supplied. I read through them and reached the following conclusions. There seems to be a philosophical issue about the boundary between a user application and the Lucene API, where should one start and the other stop. The other issue is the significant difference between compression and encryption. As far as the first issue is concerned it is really a matter of personal choice and preference. My feeling is that as long as adding functionality does not impair the performance of the API as a whole, it makes sense to add it to Lucene and thus simplify the task of the application developer. After all, application developers do not have to use all the features of the API and always have the option of subclassing, writing a better version of it if they can, or writing the functionality as part of the application, even if the API provides that functionality already. The API is there to make life easier for those developers who want to use it, nobody "has" to use it. The second issue is more technical. Compression simply compresses the stored data to save storage. The index itself is not compressed therefore searching proceeds as normal. With encryption however you must encrypt the index as well as the stored data otherwise one could reconstruct the source document from the index and thus defeat the purpose of encryption. Correct me if I am wrong, but I think that encrypting the Lucene index is not easy to achieve from outside of Lucene, it implies re-writing as part of the application much code now part of Lucene (see issue number one above), hence my preference for including it as part of the Lucene API rather than as part of the application. Victor Luke Nezda wrote: > > I think that adding encryption support to Lucene fields is a bad idea for > the same reasons adding compression was a bad idea (conclusive comments on > the tail of this issue > http://issues.apache.org/jira/browse/LUCENE-648?page=all). Binary fields > can be used by users to achieve this end. Maybe a contrib with utility > methods would be a compromise to preserve this work and make it accessible > to others, or alternatively just a faq entry with the sample code or > references to it. > Luke > > On 11/29/06, negrinv <[EMAIL PROTECTED] > wrote: >> >> >> Attached are proposed modifications to Lucene 2.0 to support >> Field.Store.Encrypted. >> The rational behind this proposal is simple. Since Lucene can store data >> in >> the index, it effectively makes the data portable. It is conceivable that >> some of the data may be sensitive in nature, hence the option to encrypt >> it. >> Both the data and its index are encrypted in this implementation. >> This is only an initial implementation. It has the following several >> restrictions, all of which can be resolved if required, albeit with some >> effort and more changes to Lucene: >> 1) binary and compressed fields cannot be encrypted as well (a plaintext >> once encrypted becomes binary). >> 2) Field.Store.Encrypted implies Field.Store.Yes >> This makes sense but it forces one to store the data in the same index >> where >> the tokens are stored. It may be preferable at times to have two indeces, >> one for tokens, the other for the data. >> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an >> open >> source package, very simple to use which has the advantage of >> guaranteeing >> that the length of the encrypted field is the same as the original >> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its >> Java >> Cryptography Extension, but unfortunately not in Java 1.4. >> The BouncyCastle RC4 is not the only algorythm available, others not >> depending on third party code can be used, but it was just the simplest >> to >> implement for this first attempt. >> 4) The attachements are modifications in diff form based on an early (I >> think August or September '06) repository snapshot of Lucene 2.0 >> subsequently updated from the Lucene repository on 29/11/06. They may >> need >> some additional work to merge with the latest version in the Lucene >> repository. They also include a couple of JUnit test programs which >> explain, >> as well as test, the usage. You will need the BouncyCastle .jar >> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the >> size >> of the attachements, but it can be downloaded free from: >> http://www.bouncycastle.org/latest_releases.html >> >> 5) Searching an encrypted field is restricted to single terms, no phrase >> or >> boolean searches allowed yet, and the term has to be encrypted by the >> application before searching it. (ref. attached JUnit test programs) >> >> To the extent that I have tested it, the code works as intended and does >> not >> appear to introduce any regression problems, but more testing by others >> would be desirable. >> I don't propose at this stage to do any further work with this API >> extensions unless there is some expression of interest and direction from >> the Lucene Developers team. I have an application ready to roll which >> uses >> the proposed Lucene encryption API additions (please see >> http://www.kbforge.com/index.html). The application is not yet available >> for >> downloading simply because I am not sure if the Lucene licence allows me >> to >> do so. I would appreciate your advice in this regard. My application is >> free >> but its source code is not available (yet). I should add that encryption >> does not have to be an integral part of Lucene, it can be just part of >> the >> end application, but somehow it seems to me that Field.Store.Encrypted >> belongs in the same category as compression and binary values. >> I would be happy to receive your feedback. >> >> victor negrin >> >> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt >> http://www.nabble.com/file/4377/TestEncryptedDocument.java >> TestEncryptedDocument.java >> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java >> -- >> View this message in context: >> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415 >> Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046 Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]