Luke, I should have mentioned in my earlier posting that what I am proposing uses password based encrytpion, where the password is NOT stored anywhere within Lucene. I avoided on purpose to make any references to security (as opposed to encryption) because I believe security to be the responsability of the end application, not of Lucene. Lucene in my opinion can only provide encryption services. None of the encryption APIs themselves, wether written by a third party or by Sun, can guarantee security either. Hence why Lucene cannot do it also. What it can do is provide the encryption of the data and its index. Any application using this proposed API extensions will have to work out the extent to which it can provide security within the context of all the other APIs involved and the application requirements themselves. I have to agree with you that at some stage Lucene will have to stop providing new functionality or it will become unmaintenable. But has it reached that stage yet? Victor
Luke Nezda wrote: > > Victor- > Your point is well taken that a comprehensive encryption strategy is not > quite analogous to compression which is involves more than a > transformation > of field values to a more compact form since it requires (at a minimum) > all > data structures which comprise the index be encrypted too. Maybe I spoke > to > soon. > > However, after considering this more, I think the scheme would need to be > quite invasive to provide good security. I think just plugging in > encryption simplistically would be very vulnerable to side channel > attacks. > It seems the attacker can get clear text terms encrypted via the > particular > index's QueryParser implementation and eventually create a fairly complete > decryption lookup table using Lucene's data structures, thus undermining > the security of the internal data structures (encrypted payloads would > potentially be unaffected (unless they corresponded to index Terms)). > > Let's say this weakness is OK with you. Using the current API, I think > you > can achieve your ends by using encrypting binary field values and adding a > trailing org.apache.lucene.analysis.TokenFilter you use at index and query > time that encrypts and Base64 encodes its input (has to be a String). > This > would effectively give you an encrypted form of Lucene's internal data > structures. > > In addition to my security concerns with the concept, I also still agree > with the related philosophical issues put forward to this point on the > related field compression topic. It seems inevitable to me that if > encryption support were added, eventually, application developers will try > to sell Lucene developers on adding features to it in addition to > supporting > and maintaining it (ala configurable compression quality factor). A > configurable, encrypting Base64 TokenFilter would also be a cool contrib. > > Luke > > On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote: >> >> >> Thank you Luke for your comments and the references you supplied. I read >> through them and reached the following conclusions. There seems to be a >> philosophical issue about the boundary between a user application and the >> Lucene API, where should one start and the other stop. >> The other issue is the significant difference between compression and >> encryption. >> As far as the first issue is concerned it is really a matter of personal >> choice and preference. My feeling is that as long as adding functionality >> does not impair the performance of the API as a whole, it makes sense to >> add >> it to Lucene and thus simplify the task of the application developer. >> After >> all, application developers do not have to use all the features of the >> API >> and always have the option of subclassing, writing a better version of it >> if >> they can, or writing the functionality as part of the application, even >> if >> >> the API provides that functionality already. The API is there to make >> life >> easier for those developers who want to use it, nobody "has" to use it. >> The second issue is more technical. Compression simply compresses the >> stored >> data to save storage. The index itself is not compressed therefore >> searching >> proceeds as normal. With encryption however you must encrypt the index as >> well as the stored data otherwise one could reconstruct the source >> document >> from the index and thus defeat the purpose of encryption. Correct me if I >> am >> wrong, but I think that encrypting the Lucene index is not easy to >> achieve >> from outside of Lucene, it implies re-writing as part of the application >> much code now part of Lucene (see issue number one above), hence my >> preference for including it as part of the Lucene API rather than as part >> of >> the application. >> Victor >> >> >> Luke Nezda wrote: >> > >> > I think that adding encryption support to Lucene fields is a bad idea >> for >> > the same reasons adding compression was a bad idea (conclusive comments >> on >> > the tail of this issue >> > http://issues.apache.org/jira/browse/LUCENE-648?page=all). Binary >> fields >> > can be used by users to achieve this end. Maybe a contrib with utility >> > methods would be a compromise to preserve this work and make it >> accessible >> > to others, or alternatively just a faq entry with the sample code or >> > references to it. >> > Luke >> > >> > On 11/29/06, negrinv <[EMAIL PROTECTED] > wrote: >> >> >> >> >> >> Attached are proposed modifications to Lucene 2.0 to support >> >> Field.Store.Encrypted. >> >> The rational behind this proposal is simple. Since Lucene can store >> data >> >> in >> >> the index, it effectively makes the data portable. It is conceivable >> that >> >> some of the data may be sensitive in nature, hence the option to >> encrypt >> >> it. >> >> Both the data and its index are encrypted in this implementation. >> >> This is only an initial implementation. It has the following several >> >> restrictions, all of which can be resolved if required, albeit with >> some >> >> effort and more changes to Lucene: >> >> 1) binary and compressed fields cannot be encrypted as well (a >> plaintext >> >> once encrypted becomes binary). >> >> 2) Field.Store.Encrypted implies Field.Store.Yes >> >> This makes sense but it forces one to store the data in the same index >> >> where >> >> the tokens are stored. It may be preferable at times to have two >> indeces, >> >> one for tokens, the other for the data. >> >> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is >> an >> >> open >> >> source package, very simple to use which has the advantage of >> >> guaranteeing >> >> that the length of the encrypted field is the same as the original >> >> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its >> >> Java >> >> Cryptography Extension, but unfortunately not in Java 1.4. >> >> The BouncyCastle RC4 is not the only algorythm available, others not >> >> depending on third party code can be used, but it was just the >> simplest >> >> to >> >> implement for this first attempt. >> >> 4) The attachements are modifications in diff form based on an early >> (I >> >> think August or September '06) repository snapshot of Lucene 2.0 >> >> subsequently updated from the Lucene repository on 29/11/06. They may >> >> need >> >> some additional work to merge with the latest version in the Lucene >> >> repository. They also include a couple of JUnit test programs which >> >> explain, >> >> as well as test, the usage. You will need the BouncyCastle .jar >> >> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize >> the >> >> size >> >> of the attachements, but it can be downloaded free from: >> >> http://www.bouncycastle.org/latest_releases.html >> >> >> >> 5) Searching an encrypted field is restricted to single terms, no >> phrase >> >> or >> >> boolean searches allowed yet, and the term has to be encrypted by the >> >> application before searching it. (ref. attached JUnit test programs) >> >> >> >> To the extent that I have tested it, the code works as intended and >> does >> >> not >> >> appear to introduce any regression problems, but more testing by >> others >> >> >> would be desirable. >> >> I don't propose at this stage to do any further work with this API >> >> extensions unless there is some expression of interest and direction >> from >> >> the Lucene Developers team. I have an application ready to roll which >> >> uses >> >> the proposed Lucene encryption API additions (please see >> >> http://www.kbforge.com/index.html). The application is not yet >> available >> >> for >> >> downloading simply because I am not sure if the Lucene licence allows >> me >> >> to >> >> do so. I would appreciate your advice in this regard. My application >> is >> >> free >> >> but its source code is not available (yet). I should add that >> encryption >> >> does not have to be an integral part of Lucene, it can be just part of >> >> the >> >> end application, but somehow it seems to me that Field.Store.Encrypted >> >> belongs in the same category as compression and binary values. >> >> I would be happy to receive your feedback. >> >> >> >> victor negrin >> >> >> >> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt >> >> http://www.nabble.com/file/4377/TestEncryptedDocument.java >> >> TestEncryptedDocument.java >> >> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415 >> >> Sent from the Lucene - Java Developer mailing list archive at >> Nabble.com. >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046 >> Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7634221 Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]