Thanks Walter, that would be a neat solution if we just wanted to store values, but we also want full-text query capabilities.
On 5 September 2015 at 17:56, Walter Underwood <[email protected]> wrote: > Alternatively, do not store values in the Solr fields. Return a key and > fetch encrypted data from a database or other repository. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On Sep 5, 2015, at 9:40 AM, Erick Erickson <[email protected]> > wrote: > > The easiest way to do this is put the index over > an encrypted file system. Encrypting the actual > _tokens_ has a few problems, not the least of > which is that any encryption algorithm worth > its salt is going to make most searching totally > impossible. > > Consider run, runner, running and runs with > simple wildcards. Searching for run* requires that all 4 > variants have 'run' as a prefix, and any decent > encryption algorithm will not do that. Any > encryption that _does_ make that search possible > is trivially broken. I usually stop my thinking there, > but ngrams, casing, WordDelimiterFilterFactory > all come immediately to mind as "interesting". > > But what about stored data you ask? Yes, the > stored fields are compressed but stored verbatim, > so I've seen arguments for encrypting _that_ stream, > but that's really a "feel good" fig-leaf. If I get access to the > index and it has position information, I can reconstruct > documents without the stored data as Luke does. The > process is a bit lossy, but the reconstructed document > has enough fidelity that it'll give people seriously > concerned about encryption conniption fits. > > So all in all I have to back up Shawn's comments: You're > better off isolating your Solr/Lucene system, putting > authorization to view _documents_ at that level, and possibly > using an encrypted filesystem. > > FWIW, > Erick > > On Sat, Sep 5, 2015 at 7:27 AM, Shawn Heisey <[email protected]> wrote: > > On 9/5/2015 5:06 AM, Adam Retter wrote: > > I wondered if there is any facility already existing in Lucene for > encrypting the values stored into the index and still being able to > search them? > > If not, I wondered if anyone could tell me if this is impossible to > implement, and if not to point me perhaps in the right direction? > > I imagine that just the text values and document fields to index (and > optionally store) in the index would be either encrypted on the fly by > Lucene using perhaps a public/private key mechanism. When a user issues > a search query to Lucene they would also provide a key so that Lucene > can decrypt the values as necessary to try and answer their query. > > > I think you could probably add transparent encryption/decryption at the > Lucene level in a custom codec. That probably has implications for > being able to read the older index when it's time to upgrade Lucene, > with a complete reindex being the likely solution. Others will need to > confirm ... I'm not very familiar with Lucene code, I'm here for Solr. > > Any verification of user identity/permission is probably best done in > your own code, before it makes the Lucene query, and wouldn't > necessarily be related to the encryption. > > Requirements like this are usually driven by paranoid customers or > product managers. I think that when you really start to examine what an > attacker has to do to actually reach the unencrypted information (Lucene > index in this case), they already have acquired so much access that the > system is completely breached and it won't matter what kind of > encryption is added. > > I find many of these requirements to be silly, and put an incredible > burden on admin and developer resources with little or no benefit. > Here's an example of similar customer encryption requirement which I > encountered recently: > > We have a web application that has three "hops" involved. A user talks > to a load balancer, which talks to Apache, where the connection is then > proxied to a Tomcat server with the AJP protocol. The customer wanted > all three hops encrypted. The first hop was already encrypted, the > second was easy, but the third proved to be very difficult. Finally we > decided that we did not need load balancing on that last hop, and it > could simply talk to localhost, eliminating the need to encrypt it. > > The customer was worried about an attacker sniffing the traffic on the > LAN and seeing details like passwords. I consider this to be an insane > requirement. In order to sniff that traffic, the attacker would need > one of two things: Root access on a server, or physical access to the > infrastructure. Physical access can be escalated to root access if you > know what you're doing. Once someone has either of those things, > encrypted traffic won't matter, they will be able to learn anything they > need or do any damage they desire, without even needing to sniff the > traffic. > > Thanks, > Shawn > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > -- Adam Retter skype: adam.retter tweet: adamretter http://www.adamretter.org.uk
