Thanks very much Jack, I will take a look into those. On 8 September 2015 at 16:21, Jack Krupansky <[email protected]> wrote:
> Here's an old Lucene issue/patch for an AES encrypted Lucene directory > class that might give you some ideas: > https://issues.apache.org/jira/browse/LUCENE-2228 > > No idea what happened to it. > > An even older issue attempting to add encryption for specific fields: > https://issues.apache.org/jira/browse/LUCENE-737 > > -- Jack Krupansky > > On Tue, Sep 8, 2015 at 11:07 AM, Adam Retter <[email protected]> > wrote: > >> >> The easiest way to do this is put the index over >>> an encrypted file system. Encrypting the actual >>> _tokens_ has a few problems, not the least of >>> which is that any encryption algorithm worth >>> its salt is going to make most searching totally >>> impossible. >>> >> >> I already suggested an encrypted filesystem to the customer but >> unfortunately that was rejected. >> >> >> Consider run, runner, running and runs with >>> simple wildcards. Searching for run* requires that all 4 >>> variants have 'run' as a prefix, and any decent >>> encryption algorithm will not do that. Any >>> encryption that _does_ make that search possible >>> is trivially broken. I usually stop my thinking there, >>> but ngrams, casing, WordDelimiterFilterFactory >>> all come immediately to mind as "interesting". >>> >> >> I was rather hoping that I could do the encryption and subsequent >> decryption at a level below the search itself, so that when the query >> examines the data it sees the decrypted values so that things like prefix >> scans etc would indeed still work. Previously in this thread, Shawn >> suggested writing a custom codec, I wonder if that would enable querying? >> >> >>> But what about stored data you ask? Yes, the >>> stored fields are compressed but stored verbatim, >>> so I've seen arguments for encrypting _that_ stream, >>> but that's really a "feel good" fig-leaf. If I get access to the >>> index and it has position information, I can reconstruct >>> documents without the stored data as Luke does. The >>> process is a bit lossy, but the reconstructed document >>> has enough fidelity that it'll give people seriously >>> concerned about encryption conniption fits. >>> >> >> Exactly! >> >> >>> >>> So all in all I have to back up Shawn's comments: You're >>> better off isolating your Solr/Lucene system, putting >>> authorization to view _documents_ at that level, and possibly >>> using an encrypted filesystem. >>> >>> FWIW, >>> Erick >>> >>> On Sat, Sep 5, 2015 at 7:27 AM, Shawn Heisey <[email protected]> >>> wrote: >>> > On 9/5/2015 5:06 AM, Adam Retter wrote: >>> >> I wondered if there is any facility already existing in Lucene for >>> >> encrypting the values stored into the index and still being able to >>> >> search them? >>> >> >>> >> If not, I wondered if anyone could tell me if this is impossible to >>> >> implement, and if not to point me perhaps in the right direction? >>> >> >>> >> I imagine that just the text values and document fields to index (and >>> >> optionally store) in the index would be either encrypted on the fly by >>> >> Lucene using perhaps a public/private key mechanism. When a user >>> issues >>> >> a search query to Lucene they would also provide a key so that Lucene >>> >> can decrypt the values as necessary to try and answer their query. >>> > >>> > I think you could probably add transparent encryption/decryption at the >>> > Lucene level in a custom codec. That probably has implications for >>> > being able to read the older index when it's time to upgrade Lucene, >>> > with a complete reindex being the likely solution. Others will need to >>> > confirm ... I'm not very familiar with Lucene code, I'm here for Solr. >>> > >>> > Any verification of user identity/permission is probably best done in >>> > your own code, before it makes the Lucene query, and wouldn't >>> > necessarily be related to the encryption. >>> > >>> > Requirements like this are usually driven by paranoid customers or >>> > product managers. I think that when you really start to examine what >>> an >>> > attacker has to do to actually reach the unencrypted information >>> (Lucene >>> > index in this case), they already have acquired so much access that the >>> > system is completely breached and it won't matter what kind of >>> > encryption is added. >>> > >>> > I find many of these requirements to be silly, and put an incredible >>> > burden on admin and developer resources with little or no benefit. >>> > Here's an example of similar customer encryption requirement which I >>> > encountered recently: >>> > >>> > We have a web application that has three "hops" involved. A user talks >>> > to a load balancer, which talks to Apache, where the connection is then >>> > proxied to a Tomcat server with the AJP protocol. The customer wanted >>> > all three hops encrypted. The first hop was already encrypted, the >>> > second was easy, but the third proved to be very difficult. Finally we >>> > decided that we did not need load balancing on that last hop, and it >>> > could simply talk to localhost, eliminating the need to encrypt it. >>> > >>> > The customer was worried about an attacker sniffing the traffic on the >>> > LAN and seeing details like passwords. I consider this to be an insane >>> > requirement. In order to sniff that traffic, the attacker would need >>> > one of two things: Root access on a server, or physical access to the >>> > infrastructure. Physical access can be escalated to root access if you >>> > know what you're doing. Once someone has either of those things, >>> > encrypted traffic won't matter, they will be able to learn anything >>> they >>> > need or do any damage they desire, without even needing to sniff the >>> > traffic. >>> > >>> > Thanks, >>> > Shawn >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: [email protected] >>> > For additional commands, e-mail: [email protected] >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> >> -- >> Adam Retter >> >> skype: adam.retter >> tweet: adamretter >> http://www.adamretter.org.uk >> > > -- Adam Retter skype: adam.retter tweet: adamretter http://www.adamretter.org.uk
