Alternatively, do not store values in the Solr fields. Return a key and fetch 
encrypted data from a database or other repository.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)


On Sep 5, 2015, at 9:40 AM, Erick Erickson <[email protected]> wrote:

> The easiest way to do this is put the index over
> an encrypted file system. Encrypting the actual
> _tokens_ has a few problems, not the least of
> which is that any encryption algorithm worth
> its salt is going to make most searching totally
> impossible.
> 
> Consider run, runner, running and runs with
> simple wildcards. Searching for run* requires that all 4
> variants have 'run' as a prefix, and any decent
> encryption algorithm will not do that. Any
> encryption that _does_ make that search possible
> is trivially broken. I usually stop my thinking there,
> but ngrams, casing, WordDelimiterFilterFactory
> all come immediately to mind as "interesting".
> 
> But what about stored data you ask? Yes, the
> stored fields are compressed but stored verbatim,
> so I've seen arguments for encrypting _that_ stream,
> but that's really a "feel good" fig-leaf. If I get access to the
> index and it has position information, I can reconstruct
> documents without the stored data as Luke does. The
> process is a bit lossy, but the reconstructed document
> has enough fidelity that it'll give people seriously
> concerned about encryption conniption fits.
> 
> So all in all I have to back up Shawn's comments: You're
> better off isolating your Solr/Lucene system, putting
> authorization to view _documents_ at that level, and possibly
> using an encrypted filesystem.
> 
> FWIW,
> Erick
> 
> On Sat, Sep 5, 2015 at 7:27 AM, Shawn Heisey <[email protected]> wrote:
>> On 9/5/2015 5:06 AM, Adam Retter wrote:
>>> I wondered if there is any facility already existing in Lucene for
>>> encrypting the values stored into the index and still being able to
>>> search them?
>>> 
>>> If not, I wondered if anyone could tell me if this is impossible to
>>> implement, and if not to point me perhaps in the right direction?
>>> 
>>> I imagine that just the text values and document fields to index (and
>>> optionally store) in the index would be either encrypted on the fly by
>>> Lucene using perhaps a public/private key mechanism. When a user issues
>>> a search query to Lucene they would also provide a key so that Lucene
>>> can decrypt the values as necessary to try and answer their query.
>> 
>> I think you could probably add transparent encryption/decryption at the
>> Lucene level in a custom codec.  That probably has implications for
>> being able to read the older index when it's time to upgrade Lucene,
>> with a complete reindex being the likely solution.  Others will need to
>> confirm ... I'm not very familiar with Lucene code, I'm here for Solr.
>> 
>> Any verification of user identity/permission is probably best done in
>> your own code, before it makes the Lucene query, and wouldn't
>> necessarily be related to the encryption.
>> 
>> Requirements like this are usually driven by paranoid customers or
>> product managers.  I think that when you really start to examine what an
>> attacker has to do to actually reach the unencrypted information (Lucene
>> index in this case), they already have acquired so much access that the
>> system is completely breached and it won't matter what kind of
>> encryption is added.
>> 
>> I find many of these requirements to be silly, and put an incredible
>> burden on admin and developer resources with little or no benefit.
>> Here's an example of similar customer encryption requirement which I
>> encountered recently:
>> 
>> We have a web application that has three "hops" involved.  A user talks
>> to a load balancer, which talks to Apache, where the connection is then
>> proxied to a Tomcat server with the AJP protocol.  The customer wanted
>> all three hops encrypted.  The first hop was already encrypted, the
>> second was easy, but the third proved to be very difficult.  Finally we
>> decided that we did not need load balancing on that last hop, and it
>> could simply talk to localhost, eliminating the need to encrypt it.
>> 
>> The customer was worried about an attacker sniffing the traffic on the
>> LAN and seeing details like passwords.  I consider this to be an insane
>> requirement.  In order to sniff that traffic, the attacker would need
>> one of two things:  Root access on a server, or physical access to the
>> infrastructure.  Physical access can be escalated to root access if you
>> know what you're doing.  Once someone has either of those things,
>> encrypted traffic won't matter, they will be able to learn anything they
>> need or do any damage they desire, without even needing to sniff the
>> traffic.
>> 
>> Thanks,
>> Shawn
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

Reply via email to