Thank you Robert for your commnets. I am inclined to agree with you, but I would like to establish first of all if simplicity of implementation is the overriding consideration. But before I dwell on that let me say that i have discovered that I am not a master of DIFF file creation with Eclipse. The diff file attachement to my original posting is absurdly large and not correct. I have therefore attached a zip file containing the complete source code of the classes I modified. I leave it to others to extract the diffs properly. Back to the issue. So far the implementation has not been difficult considering that I knew nothing about Lucene internals before I started. The reason is that Lucene is very well structured and the changes just fitted nicely by adding some code in the right place with minimal changes to the existing code. But I admit that the proposed implementation so far is not complete and more work is required to overcome some of its restrictions. While I like your idea I believe that it imposed too large a granularity on the encrypted data, all fields will all kinds of data will be encrypted including images and others which normally would be left alone, thus adding to the performance penalty due to encryption. Many hardware devices and most operating systems already provide directory or file system encryption therefore that level of encryption appears to me an unnecessary addition to Lucene. Encryption at field level however is not provided by anything I know. The key in my opinion is to decide what is best from the end user point of view, but perhaps we need more discussion on this. Victor
http://www.nabble.com/file/4390/LuceneEncryptionMods.zip LuceneEncryptionMods.zip Robert Engels wrote: > > I think a simpler solution would be to create a EncryptedDirectory > implementation of Directory, which requires a password to open/modify the > directory. > > Far simpler, and if yuou are using encryption to begin with, you are > probably encrypting most of the data anyway. > > -----Original Message----- >>From: negrinv <[EMAIL PROTECTED]> >>Sent: Nov 29, 2006 9:45 PM >>To: java-dev@lucene.apache.org >>Subject: Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted >> >> >>Thank you Luke for your comments and the references you supplied. I read >>through them and reached the following conclusions. There seems to be a >>philosophical issue about the boundary between a user application and the >>Lucene API, where should one start and the other stop. >>The other issue is the significant difference between compression and >>encryption. >>As far as the first issue is concerned it is really a matter of personal >>choice and preference. My feeling is that as long as adding functionality >>does not impair the performance of the API as a whole, it makes sense to add >>it to Lucene and thus simplify the task of the application developer. After >>all, application developers do not have to use all the features of the API >>and always have the option of subclassing, writing a better version of it if >>they can, or writing the functionality as part of the application, even if >>the API provides that functionality already. The API is there to make life >>easier for those developers who want to use it, nobody "has" to use it. >>The second issue is more technical. Compression simply compresses the stored >>data to save storage. The index itself is not compressed therefore searching >>proceeds as normal. With encryption however you must encrypt the index as >>well as the stored data otherwise one could reconstruct the source document >>from the index and thus defeat the purpose of encryption. Correct me if I am >>wrong, but I think that encrypting the Lucene index is not easy to achieve >>from outside of Lucene, it implies re-writing as part of the application >>much code now part of Lucene (see issue number one above), hence my >>preference for including it as part of the Lucene API rather than as part of >>the application. >>Victor >> >> >>Luke Nezda wrote: >>> >>> I think that adding encryption support to Lucene fields is a bad idea >>> for >>> the same reasons adding compression was a bad idea (conclusive comments >>> on >>> the tail of this issue >>> http://issues.apache.org/jira/browse/LUCENE-648?page=all). Binary >>> fields >>> can be used by users to achieve this end. Maybe a contrib with utility >>> methods would be a compromise to preserve this work and make it >>> accessible >>> to others, or alternatively just a faq entry with the sample code or >>> references to it. >>> Luke >>> >>> On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> Attached are proposed modifications to Lucene 2.0 to support >>>> Field.Store.Encrypted. >>>> The rational behind this proposal is simple. Since Lucene can store >>>> data >>>> in >>>> the index, it effectively makes the data portable. It is conceivable >>>> that >>>> some of the data may be sensitive in nature, hence the option to >>>> encrypt >>>> it. >>>> Both the data and its index are encrypted in this implementation. >>>> This is only an initial implementation. It has the following several >>>> restrictions, all of which can be resolved if required, albeit with >>>> some >>>> effort and more changes to Lucene: >>>> 1) binary and compressed fields cannot be encrypted as well (a >>>> plaintext >>>> once encrypted becomes binary). >>>> 2) Field.Store.Encrypted implies Field.Store.Yes >>>> This makes sense but it forces one to store the data in the same index >>>> where >>>> the tokens are stored. It may be preferable at times to have two >>>> indeces, >>>> one for tokens, the other for the data. >>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an >>>> open >>>> source package, very simple to use which has the advantage of >>>> guaranteeing >>>> that the length of the encrypted field is the same as the original >>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its >>>> Java >>>> Cryptography Extension, but unfortunately not in Java 1.4. >>>> The BouncyCastle RC4 is not the only algorythm available, others not >>>> depending on third party code can be used, but it was just the simplest >>>> to >>>> implement for this first attempt. >>>> 4) The attachements are modifications in diff form based on an early (I >>>> think August or September '06) repository snapshot of Lucene 2.0 >>>> subsequently updated from the Lucene repository on 29/11/06. They may >>>> need >>>> some additional work to merge with the latest version in the Lucene >>>> repository. They also include a couple of JUnit test programs which >>>> explain, >>>> as well as test, the usage. You will need the BouncyCastle .jar >>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the >>>> size >>>> of the attachements, but it can be downloaded free from: >>>> http://www.bouncycastle.org/latest_releases.html >>>> >>>> 5) Searching an encrypted field is restricted to single terms, no >>>> phrase >>>> or >>>> boolean searches allowed yet, and the term has to be encrypted by the >>>> application before searching it. (ref. attached JUnit test programs) >>>> >>>> To the extent that I have tested it, the code works as intended and >>>> does >>>> not >>>> appear to introduce any regression problems, but more testing by others >>>> would be desirable. >>>> I don't propose at this stage to do any further work with this API >>>> extensions unless there is some expression of interest and direction >>>> from >>>> the Lucene Developers team. I have an application ready to roll which >>>> uses >>>> the proposed Lucene encryption API additions (please see >>>> http://www.kbforge.com/index.html). The application is not yet >>>> available >>>> for >>>> downloading simply because I am not sure if the Lucene licence allows >>>> me >>>> to >>>> do so. I would appreciate your advice in this regard. My application is >>>> free >>>> but its source code is not available (yet). I should add that >>>> encryption >>>> does not have to be an integral part of Lucene, it can be just part of >>>> the >>>> end application, but somehow it seems to me that Field.Store.Encrypted >>>> belongs in the same category as compression and binary values. >>>> I would be happy to receive your feedback. >>>> >>>> victor negrin >>>> >>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt >>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java >>>> TestEncryptedDocument.java >>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415 >>>> Sent from the Lucene - Java Developer mailing list archive at >>>> Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>>> >>> >>> >> >>-- >>View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046 >>Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: [EMAIL PROTECTED] >>For additional commands, e-mail: [EMAIL PROTECTED] >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7631251 Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]