Thanks very much Jack, I will take a look into those.

On 8 September 2015 at 16:21, Jack Krupansky <[email protected]>
wrote:

> Here's an old Lucene issue/patch for an AES encrypted Lucene directory
> class that might give you some ideas:
> https://issues.apache.org/jira/browse/LUCENE-2228
>
> No idea what happened to it.
>
> An even older issue attempting to add encryption for specific fields:
> https://issues.apache.org/jira/browse/LUCENE-737
>
> -- Jack Krupansky
>
> On Tue, Sep 8, 2015 at 11:07 AM, Adam Retter <[email protected]>
> wrote:
>
>>
>> The easiest way to do this is put the index over
>>> an encrypted file system. Encrypting the actual
>>> _tokens_ has a few problems, not the least of
>>> which is that any encryption algorithm worth
>>> its salt is going to make most searching totally
>>> impossible.
>>>
>>
>> I already suggested an encrypted filesystem to the customer but
>> unfortunately that was rejected.
>>
>>
>> Consider run, runner, running and runs with
>>> simple wildcards. Searching for run* requires that all 4
>>> variants have 'run' as a prefix, and any decent
>>> encryption algorithm will not do that. Any
>>> encryption that _does_ make that search possible
>>> is trivially broken. I usually stop my thinking there,
>>> but ngrams, casing, WordDelimiterFilterFactory
>>> all come immediately to mind as "interesting".
>>>
>>
>> I was rather hoping that I could do the encryption and subsequent
>> decryption at a level below the search itself, so that when the query
>> examines the data it sees the decrypted values so that things like prefix
>> scans etc would indeed still work. Previously in this thread, Shawn
>> suggested writing a custom codec, I wonder if that would enable querying?
>>
>>
>>> But what about stored data you ask? Yes, the
>>> stored fields are compressed but stored verbatim,
>>> so I've seen arguments for encrypting _that_ stream,
>>> but that's really a "feel good" fig-leaf. If I get access to the
>>> index and it has position information, I can reconstruct
>>> documents without the stored data as Luke does. The
>>> process is a bit lossy, but the reconstructed document
>>> has enough fidelity that it'll give people seriously
>>> concerned about encryption conniption fits.
>>>
>>
>> Exactly!
>>
>>
>>>
>>> So all in all I have to back up Shawn's comments: You're
>>> better off isolating your Solr/Lucene system, putting
>>> authorization to view _documents_ at that level, and possibly
>>> using an encrypted filesystem.
>>>
>>> FWIW,
>>> Erick
>>>
>>> On Sat, Sep 5, 2015 at 7:27 AM, Shawn Heisey <[email protected]>
>>> wrote:
>>> > On 9/5/2015 5:06 AM, Adam Retter wrote:
>>> >> I wondered if there is any facility already existing in Lucene for
>>> >> encrypting the values stored into the index and still being able to
>>> >> search them?
>>> >>
>>> >> If not, I wondered if anyone could tell me if this is impossible to
>>> >> implement, and if not to point me perhaps in the right direction?
>>> >>
>>> >> I imagine that just the text values and document fields to index (and
>>> >> optionally store) in the index would be either encrypted on the fly by
>>> >> Lucene using perhaps a public/private key mechanism. When a user
>>> issues
>>> >> a search query to Lucene they would also provide a key so that Lucene
>>> >> can decrypt the values as necessary to try and answer their query.
>>> >
>>> > I think you could probably add transparent encryption/decryption at the
>>> > Lucene level in a custom codec.  That probably has implications for
>>> > being able to read the older index when it's time to upgrade Lucene,
>>> > with a complete reindex being the likely solution.  Others will need to
>>> > confirm ... I'm not very familiar with Lucene code, I'm here for Solr.
>>> >
>>> > Any verification of user identity/permission is probably best done in
>>> > your own code, before it makes the Lucene query, and wouldn't
>>> > necessarily be related to the encryption.
>>> >
>>> > Requirements like this are usually driven by paranoid customers or
>>> > product managers.  I think that when you really start to examine what
>>> an
>>> > attacker has to do to actually reach the unencrypted information
>>> (Lucene
>>> > index in this case), they already have acquired so much access that the
>>> > system is completely breached and it won't matter what kind of
>>> > encryption is added.
>>> >
>>> > I find many of these requirements to be silly, and put an incredible
>>> > burden on admin and developer resources with little or no benefit.
>>> > Here's an example of similar customer encryption requirement which I
>>> > encountered recently:
>>> >
>>> > We have a web application that has three "hops" involved.  A user talks
>>> > to a load balancer, which talks to Apache, where the connection is then
>>> > proxied to a Tomcat server with the AJP protocol.  The customer wanted
>>> > all three hops encrypted.  The first hop was already encrypted, the
>>> > second was easy, but the third proved to be very difficult.  Finally we
>>> > decided that we did not need load balancing on that last hop, and it
>>> > could simply talk to localhost, eliminating the need to encrypt it.
>>> >
>>> > The customer was worried about an attacker sniffing the traffic on the
>>> > LAN and seeing details like passwords.  I consider this to be an insane
>>> > requirement.  In order to sniff that traffic, the attacker would need
>>> > one of two things:  Root access on a server, or physical access to the
>>> > infrastructure.  Physical access can be escalated to root access if you
>>> > know what you're doing.  Once someone has either of those things,
>>> > encrypted traffic won't matter, they will be able to learn anything
>>> they
>>> > need or do any damage they desire, without even needing to sniff the
>>> > traffic.
>>> >
>>> > Thanks,
>>> > Shawn
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: [email protected]
>>> > For additional commands, e-mail: [email protected]
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>>
>> --
>> Adam Retter
>>
>> skype: adam.retter
>> tweet: adamretter
>> http://www.adamretter.org.uk
>>
>
>


-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Reply via email to