Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

robert engels Tue, 05 Dec 2006 13:38:34 -0800

If it is only meant to protect from "prying eyes" a simple fieldlevel analyzer that does a simple xor/rotation should suffice. Itwill be much faster and simpler.

Going beyond that, your solution is not very secure as has beenpointed out, so you might as well just uses the simplest solution.



On Dec 5, 2006, at 3:28 PM, negrinv wrote:

Chris Hostetter wrote:
Compression of stored fields is a feature that the Lucene "core"currentlysupports out of the box -- but it does so in a very limited manerthatdoesn't allow for much configuration. There is no advantage forusers inusing compressed fields over compressing the data themselvesbefore addingit to the index, only disdvantages: notably the limited controlthe userhas over the compression, and added complexity for the code pathexecuted
by all users -- even if they don't use compression (a boolean test on
"compressed" in FieldsReader may be fast ... but it's still abytecode opfor every field that's completley uneccessary for a large portionof the
user base)
If the code was not already in the core, and someone asked aboutadding itI would argue against doing so on the grounds that some helpfullutilitymethods (possibly in a contrib) would be just as usefull, andwould have
no performance cost for people who don't care about compression.
Perhaps, if you look at compression on its own, but once you seecompressionin the context of all the other field options it makes sense tohave it
added to Lucene, it's about having everything in one place for ease of
implementation that offsets the performance issue, in my opinion.
First off, if all we are interested in in Encrypting *stored* data,
then the issue becomes exactly the same as compression: there isno pointin putting this functionality in the "core" Lucene code base whenit canbe done using helper utility methods -- now that that's out of theway,
let's talk about the good stuff...
As above
If we want to encrypt the text portion of Terms that are index for a
specific set of fields, this is again something that can easily bedonewithout modifying the "core" Lucene code base -- utility methodscan be
used to help people encrypt UN_TOKENIZED Field values, and a simple
AnalyzerWrapper can be made to encrypt the text portion of Tokensproducedby another analyzer both when indexing Field values and whenQueryParser
is Analyzing input text if neccessary.
I take your word for it, but wouldn't you agree that replacing allthe abovewith just one line, "Field.Store.Encrypted" (orField.Store.Encrypt, forcompatibility with Field.Store.Compress),would be a lot easier touse for
the average developer?
As others have already pointed out: encrypting just the Term textdoesn'tdo much to aid the overall security of your data -- because a badguy with
access to your index can use the various statistics about your terms
(docFreq, term vectors, term positions, etc...) to aid them incrackingyour encryption -- maybe a user is okay with that risk, in whichcase myprevious comment about how this can easily be done withoutmodifying anycore lucene classes still holds. what about users who don't thinkthis is
an acceptible risk? ... a more robust encryption mechanism is
neccessary...
Security is a big topic, we cannot hope to discuss it here. I amtalking
about some form of data protection, not security.
When you say "a bad guy with access to your index", you imply thatnothing
can be done to protect the index. But accessing an index which you are
determined to protect would not be easy, would require expertise,money, aswell as the risk of a potential jail sentence. If you have NationalSecurityin mind, be assured no agency responsible for national securitywill useopen source software which is not certified, and that is downloadedfrom anunsecure site over the internet, in order to protect the nation (Ihope!).
If we are talking about applications which need to protect datafrom curiousor even ill-intentioned eyes, then you can provide a deterrent byencrypting
that sensitive data only. It might be a list of names, or balances, or
credit card numbers. Lucene alone can only provide some form of data
protection, not security. If you accept this limitation you willfind iteasier to accept the notion of encryption at field level, just likesomerelational database software encrypts at column level. Just asimportantlyyou want to be able to search over that encrypted field, somehingwhich my
proposed code provides (within the stated current limitation).
So exactly what pieces of data about a set of fields in an indexneed tobe encrypted before you can adequetly say that those fields areencrypted?Off the top of my head i don't know, but I think the only way toplay it
safe is to assume thta *all* of the data needs to be encrypted.
Cannot agree here, it's application dependent. And keep in mindthat onceyou offer new functionality people will find many originalapplications for
it.
Now the question becomes: do we modify all of the index writitng/reading
code
to add a lot of "if (encrypted) { ... } else { ... }" checks, oris there
an easier way to ensure that all of the data in encrypted without
impacting the majority of hte user base?
A perfectly valid point, only benchmarking will tell by how muchthe current
performance of Lucene will be impacted by the addition of encryption.
Somebody in this discussion suggested a Lucene benchmarking toolwhich canbe used. I am not familiar with it, but if it is easy to run thenlet's do
it and resolve factually this part of the discussion.
On a more philosophical level, are you saying that there should notbe anyadded functionality to Lucene if it impacts the performance ofthose who donot need the additional functionality. This could be a majorlimitation tothe future of Lucene. Perhaps one should set some small % limits tothe
level of impact, but zero could be too limiting.
I would argue that creating an EncryptedDirectory class with anAPI that
looks something like this.......
.............
.............
 - Do my concerns about that impact make sense to you?
- Does my (high level) description of how i think encryptionmight make
   sense as an optional Lucene feature make sense?
- are there any advantages you see to your approach that you feelmake it
   more worthwhile then a Directory based approach?
Points one and two are pefectly valid and make a lot of sense.Point threeis about what is best for the most, given that there is already anOS option
to encrypt at directory level.
I like field encryption because it is functionality which cannot be
implemented at the OS level, and because of its granularity and its
similarity to existing Lucene functionality, it would be moreintuitive andeasier to implement at the application level. Encrypting everythingin a
directory would have a performance impact on the application.
I accept your point about the difference between a file systemdirectory and
a Lucene directory. But in order to overcome the lack of field-level
encryption and to minimise the performance impact on theapplication youwould be forced to create a separate index and directory for eachfieldwhich you want encrypted. It will work, but is not a solution Iwould like
to have adopt at the application level.
Finally a point about my code. I was unsuccessful in creating adiff filebecause I was picking up all kind of formatting differences aswell. If youscan it quickly you will find that is really very simple and, atleast inits current limited implementation, hardly invasive of Lucene'score. All
the encryption routines are in a separate class which i placed in the
utility package.

Victor
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7708481Sent from the Lucene - Java Developer mailing list archive atNabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Reply via email to