I agree with Nicolas. I think the overhead of decrypting such small payloads (I think it is also subject to an easy attack, and/or will increase index size dramtically in order to prevent such small encryption blocks) will have a serious impact on performance.
We use Lucene for indexing only and store the actual payloads elsewhere, so in our case your solution is not optimal for us. -----Original Message----- >From: Nicolas Lalev�e <[EMAIL PROTECTED]> >Sent: Dec 1, 2006 2:20 AM >To: java-dev@lucene.apache.org >Subject: Re: Attached proposed modifications to Lucene 2.0 to support >Field.Store.Encrypted > >Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit�: >> Thank you Robert for your commnets. I am inclined to agree with you, but I >> would like to establish first of all if simplicity of implementation is the >> overriding consideration. But before I dwell on that let me say that i have >> discovered that I am not a master of DIFF file creation with Eclipse. The >> diff file attachement to my original posting is absurdly large and not >> correct. I have therefore attached a zip file containing the complete >> source code of the classes I modified. I leave it to others to extract the >> diffs properly. >> Back to the issue. So far the implementation has not been difficult >> considering that I knew nothing about Lucene internals before I started. >> The reason is that Lucene is very well structured and the changes just >> fitted nicely by adding some code in the right place with minimal changes >> to the existing code. But I admit that the proposed implementation so far >> is not complete and more work is required to overcome some of its >> restrictions. While I like your idea I believe that it imposed too large a >> granularity on the encrypted data, all fields will all kinds of data will >> be encrypted including images and others which normally would be left >> alone, thus adding to the performance penalty due to encryption. > >I don't agree with you here. In Lucene, you will encrypt the field data, the >field names, and the tokens : I would say that is represents at least 2/3 of >the index size. Then, with the implementation you suggest, I think (sorry I >didn't took time to see you patch) that every time a lucene data need to be >read, it is decrypted each time. With an encrypted FS, your kernel will >maintain a cache in RAM for you, so it won't hurt so much. >It needs some bench to see what is effectively the best, but I have doubt that >your solution will be faster. > >Nicolas. > >> Many >> hardware devices and most operating systems already provide directory or >> file system encryption therefore that level of encryption appears to me an >> unnecessary addition to Lucene. Encryption at field level however is not >> provided by anything I know. The key in my opinion is to decide what is >> best from the end user point of view, but perhaps we need more discussion >> on this. >> Victor >> >> http://www.nabble.com/file/4390/LuceneEncryptionMods.zip >> LuceneEncryptionMods.zip >> >> Robert Engels wrote: >> > I think a simpler solution would be to create a EncryptedDirectory >> > implementation of Directory, which requires a password to open/modify the >> > directory. >> > >> > Far simpler, and if yuou are using encryption to begin with, you are >> > probably encrypting most of the data anyway. >> > >> > -----Original Message----- >> > >> >>From: negrinv <[EMAIL PROTECTED]> >> >>Sent: Nov 29, 2006 9:45 PM >> >>To: java-dev@lucene.apache.org >> >>Subject: Re: Attached proposed modifications to Lucene 2.0 to support >> >> Field.Store.Encrypted >> >> >>Thank you Luke for your comments and the references you supplied. I read >> >>through them and reached the following conclusions. There seems to be a >> >>philosophical issue about the boundary between a user application and the >> >>Lucene API, where should one start and the other stop. >> >>The other issue is the significant difference between compression and >> >>encryption. >> >>As far as the first issue is concerned it is really a matter of personal >> >>choice and preference. My feeling is that as long as adding functionality >> >>does not impair the performance of the API as a whole, it makes sense to >> >> add >> >> >>it to Lucene and thus simplify the task of the application developer. >> >> After >> >> >>all, application developers do not have to use all the features of the >> >> API and always have the option of subclassing, writing a better version >> >> of it >> >> if >> >> >>they can, or writing the functionality as part of the application, even >> >> if the API provides that functionality already. The API is there to make >> >> life easier for those developers who want to use it, nobody "has" to use >> >> it. The second issue is more technical. Compression simply compresses >> >> the >> >> stored >> >> >>data to save storage. The index itself is not compressed therefore >> >> searching >> >> >>proceeds as normal. With encryption however you must encrypt the index as >> >>well as the stored data otherwise one could reconstruct the source >> >> document >> >> >>from the index and thus defeat the purpose of encryption. Correct me if I >> >> am >> >> >>wrong, but I think that encrypting the Lucene index is not easy to >> >> achieve from outside of Lucene, it implies re-writing as part of the >> >> application much code now part of Lucene (see issue number one above), >> >> hence my preference for including it as part of the Lucene API rather >> >> than as part >> >> of >> >> >>the application. >> >>Victor >> >> >> >>Luke Nezda wrote: >> >>> I think that adding encryption support to Lucene fields is a bad idea >> >>> for >> >>> the same reasons adding compression was a bad idea (conclusive comments >> >>> on >> >>> the tail of this issue >> >>> http://issues.apache.org/jira/browse/LUCENE-648?page=all). Binary >> >>> fields >> >>> can be used by users to achieve this end. Maybe a contrib with utility >> >>> methods would be a compromise to preserve this work and make it >> >>> accessible >> >>> to others, or alternatively just a faq entry with the sample code or >> >>> references to it. >> >>> Luke >> >>> >> >>> On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote: >> >>>> Attached are proposed modifications to Lucene 2.0 to support >> >>>> Field.Store.Encrypted. >> >>>> The rational behind this proposal is simple. Since Lucene can store >> >>>> data >> >>>> in >> >>>> the index, it effectively makes the data portable. It is conceivable >> >>>> that >> >>>> some of the data may be sensitive in nature, hence the option to >> >>>> encrypt >> >>>> it. >> >>>> Both the data and its index are encrypted in this implementation. >> >>>> This is only an initial implementation. It has the following several >> >>>> restrictions, all of which can be resolved if required, albeit with >> >>>> some >> >>>> effort and more changes to Lucene: >> >>>> 1) binary and compressed fields cannot be encrypted as well (a >> >>>> plaintext >> >>>> once encrypted becomes binary). >> >>>> 2) Field.Store.Encrypted implies Field.Store.Yes >> >>>> This makes sense but it forces one to store the data in the same index >> >>>> where >> >>>> the tokens are stored. It may be preferable at times to have two >> >>>> indeces, >> >>>> one for tokens, the other for the data. >> >>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is >> >>>> an open >> >>>> source package, very simple to use which has the advantage of >> >>>> guaranteeing >> >>>> that the length of the encrypted field is the same as the original >> >>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its >> >>>> Java >> >>>> Cryptography Extension, but unfortunately not in Java 1.4. >> >>>> The BouncyCastle RC4 is not the only algorythm available, others not >> >>>> depending on third party code can be used, but it was just the >> >>>> simplest to >> >>>> implement for this first attempt. >> >>>> 4) The attachements are modifications in diff form based on an early >> >>>> (I think August or September '06) repository snapshot of Lucene 2.0 >> >>>> subsequently updated from the Lucene repository on 29/11/06. They may >> >>>> need >> >>>> some additional work to merge with the latest version in the Lucene >> >>>> repository. They also include a couple of JUnit test programs which >> >>>> explain, >> >>>> as well as test, the usage. You will need the BouncyCastle .jar >> >>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize >> >>>> the size >> >>>> of the attachements, but it can be downloaded free from: >> >>>> http://www.bouncycastle.org/latest_releases.html >> >>>> >> >>>> 5) Searching an encrypted field is restricted to single terms, no >> >>>> phrase >> >>>> or >> >>>> boolean searches allowed yet, and the term has to be encrypted by the >> >>>> application before searching it. (ref. attached JUnit test programs) >> >>>> >> >>>> To the extent that I have tested it, the code works as intended and >> >>>> does >> >>>> not >> >>>> appear to introduce any regression problems, but more testing by >> >>>> others would be desirable. >> >>>> I don't propose at this stage to do any further work with this API >> >>>> extensions unless there is some expression of interest and direction >> >>>> from >> >>>> the Lucene Developers team. I have an application ready to roll which >> >>>> uses >> >>>> the proposed Lucene encryption API additions (please see >> >>>> http://www.kbforge.com/index.html). The application is not yet >> >>>> available >> >>>> for >> >>>> downloading simply because I am not sure if the Lucene licence allows >> >>>> me >> >>>> to >> >>>> do so. I would appreciate your advice in this regard. My application >> >>>> is free >> >>>> but its source code is not available (yet). I should add that >> >>>> encryption >> >>>> does not have to be an integral part of Lucene, it can be just part of >> >>>> the >> >>>> end application, but somehow it seems to me that Field.Store.Encrypted >> >>>> belongs in the same category as compression and binary values. >> >>>> I would be happy to receive your feedback. >> >>>> >> >>>> victor negrin >> >>>> >> >>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt >> >>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java >> >>>> TestEncryptedDocument.java >> >>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java >> >>>> -- >> >>>> View this message in context: >> >>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to >> >>>>-support-Field.Store.Encrypted-tf2727614.html#a7607415 Sent from the >> >>>> Lucene - Java Developer mailing list archive at Nabble.com. >> >>>> >> >>>> >> >>>> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>>> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >>-- >> >>View this message in context: >> >> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-supp >>ort-Field.Store.Encrypted-tf2727614.html#a7613046 >> >> >>Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> >> >> >> >>--------------------------------------------------------------------- >> >>To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>For additional commands, e-mail: [EMAIL PROTECTED] >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > For additional commands, e-mail: [EMAIL PROTECTED] > >-- >Nicolas LALEV�E >Solutions & Technologies >ANYWARE TECHNOLOGIES >Tel : +33 (0)5 61 00 52 90 >Fax : +33 (0)5 61 00 51 46 >http://www.anyware-tech.com > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]