Thank you Luke for your comments and the references you supplied. I read
through them and reached the following conclusions. There seems to be a
philosophical issue about the boundary between a user application and the
Lucene API, where should one start and the other stop.
The other issue is the significant difference between compression and
encryption.
As far as the first issue is concerned it is really a matter of personal
choice and preference. My feeling is that as long as adding functionality
does not impair the performance of the API as a whole, it makes sense to add
it to Lucene and thus simplify the task of the application developer. After
all, application developers do not have to use all the features of the API
and always have the option of subclassing, writing a better version of it if
they can, or writing the functionality as part of the application, even if
the API provides that functionality already. The API is there to make life
easier for those developers who want to use it, nobody "has" to use it.
The second issue is more technical. Compression simply compresses the stored
data to save storage. The index itself is not compressed therefore searching
proceeds as normal. With encryption however you must encrypt the index as
well as the stored data otherwise one could reconstruct the source document
from the index and thus defeat the purpose of encryption. Correct me if I am
wrong, but I think that encrypting the Lucene index is not easy to achieve
from outside of Lucene, it implies re-writing as part of the application
much code now part of Lucene (see issue number one above), hence my
preference for including it as part of the Lucene API rather than as part of
the application.
Victor


Luke Nezda wrote:
> 
> I think that adding encryption support to Lucene fields is a bad idea for
> the same reasons adding compression was a bad idea (conclusive comments on
> the tail of this  issue
> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary fields
> can be used by users to achieve this end.  Maybe a contrib with utility
> methods would be a compromise to preserve this work and make it accessible
> to others, or alternatively just a faq entry with the sample code or
> references to it.
> Luke
> 
> On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:
>>
>>
>> Attached are proposed modifications to Lucene 2.0 to support
>> Field.Store.Encrypted.
>> The rational behind this proposal is simple. Since Lucene can store data
>> in
>> the index, it effectively makes the data portable. It is conceivable that
>> some of the data may be sensitive in nature, hence the option to encrypt
>> it.
>> Both the data and its index are encrypted in this implementation.
>> This is only an initial implementation. It has the following several
>> restrictions, all of which can be resolved if required, albeit with some
>> effort and more changes to Lucene:
>> 1) binary and compressed fields cannot be encrypted as well (a plaintext
>> once encrypted becomes binary).
>> 2) Field.Store.Encrypted implies Field.Store.Yes
>> This makes sense but it forces one to store the data in the same index
>> where
>> the tokens are stored. It may be preferable at times to have two indeces,
>> one for tokens, the other for the data.
>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
>> open
>> source package, very simple to use which has the advantage of
>> guaranteeing
>> that the length of the encrypted field is the same as the original
>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
>> Java
>> Cryptography Extension, but unfortunately not in Java 1.4.
>> The BouncyCastle RC4 is not the only algorythm available, others not
>> depending on third party code can be used, but it was just the simplest
>> to
>> implement for this first attempt.
>> 4) The attachements are modifications in diff form based on an early (I
>> think August or September '06) repository snapshot of Lucene 2.0
>> subsequently updated from the Lucene repository on 29/11/06. They may
>> need
>> some additional work to merge with the latest version in the Lucene
>> repository. They also include a couple of JUnit test programs which
>> explain,
>> as well as test, the usage. You will need the BouncyCastle .jar
>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
>> size
>> of the attachements, but it can be downloaded free from:
>> http://www.bouncycastle.org/latest_releases.html
>>
>> 5) Searching an encrypted field is restricted to single terms, no phrase
>> or
>> boolean searches allowed yet, and the term has to be encrypted by the
>> application before searching it. (ref. attached JUnit test programs)
>>
>> To the extent that I have tested it, the code works as intended and does
>> not
>> appear to introduce any regression problems, but more testing by others
>> would be desirable.
>> I don't propose at this stage to do any further work with this API
>> extensions unless there is some expression of interest and direction from
>> the Lucene Developers team. I have an application ready to roll which
>> uses
>> the proposed Lucene encryption API additions (please see
>> http://www.kbforge.com/index.html). The application is not yet available
>> for
>> downloading simply because I am not sure if the Lucene licence allows me
>> to
>> do so. I would appreciate your advice in this regard. My application is
>> free
>> but its source code is not available (yet). I should add that encryption
>> does not have to be an integral part of Lucene, it can be just part of
>> the
>> end application, but somehow it seems to me that Field.Store.Encrypted
>> belongs in the same category as compression and binary values.
>> I would be happy to receive your feedback.
>>
>> victor negrin
>>
>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>> TestEncryptedDocument.java
>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>> --
>> View this message in context:
>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to