>
> I'm not a security folk, some of the responders might be. I am just trying
> to deliver a requirement, and have been told by the client that the
> suggested encrypted filesystem etc is not good enough.


By 'we' I meant both you and the rest of us. I consider you on our side,
the Lucene/Solr folks, and not the annoying side, the security folks :).

At the moment I am predominantly just trying to understand if it is even
> possible


Sure, understood. It is possible, at least to the extent that I've tested
this in the past. The AESDirectory on one of those issues does what you (or
your client) want.

However, I assume that if someone gets root access to the server, they can
> just dump the server's RAM to a disk file and have access to all the keys
> that happen to be in RAM anyway and that I can't really protect against
> that.


This is what I meant (relates to your last question too) by protecting the
program's RAM. If the attacker can dump the process's RAM and derive the
encryption keys (or the un-encrypted cached index content), you're back to
square 1. This is why I believe most of us think that encrypting the index
is not THE solution for protecting the data, but rather protect the system
itself. After someone already broke in, the assumption should be that
there's very little (if anything) you can do to prevent data theft.

I think the fact that someone might use statistical analysis to guess and
> potentially decrypt the index will be of little worry to them (even if I
> explain it).


Just in case I wasn't clear, let me clarify this. Using an
EncryptingDirectory *does not* allow one to use statistical analysis to
guess the index's content. It is a low-level solution, one level above
encrypted file system. To an attacker who doesn't have the encryption keys
the index will look like a series of garbage bytes. Even if you know where
to look for the terms (i.e. which files), the bytes will not conform to the
regular Lucene index file format.

The Codec works at the same level BTW, and would achieve the same results.
Only I believe a Codec is an overkill. However, if all you want to do is
encrypt some parts of the index, e.g. only the terms, you could explore the
Codec approach. But as I wrote before, I don't believe it's a good solution
- it's better to encrypt everything into one giant blob, to
avoid decryption by statistical analysis.

Shai

On Wed, Sep 9, 2015 at 2:28 AM, Adam Retter <adam.ret...@googlemail.com>
wrote:

>
> The problem with encrypted file systems is that if someone gets access to
>> the file system (not the disk, the file system e.g via ssh), it is wide
>> open to it. It's like my work laptop's disk is encrypted, but after I've
>> entered my password, all files are readable to me. However, files that are
>> password protected, aren't, and that's what security experts want - that
>> even if an attacker stole the machine and has all the passwords and the
>> time in the world, without the public/private key of the encrypted index,
>> he won't be able to read it. I'm not justifying it, just repeating what I
>> was told. Even though I think it's silly - if someone managed to get a hold
>> of the machine, the login password, root access... what are the chance he
>> doesn't already have the other keys?
>>
>
> I was rather assuming an encrypted filesystem (a partition if you like)
> that is only available to a specific system user which our application runs
> under. This filesystem would only hold the Lucene indexes, it would not be
> a general purpose system boot filesystem as you are describing.
>
>
>> Anyway, we're here to solve the technical problem, and we obviously
>> aren't the ones making these decisions, and it's futile attempting to argue
>> with security folks, so let's address the question of how to achieve
>> encryption.
>>
>
> I'm not a security folk, some of the responders might be. I am just trying
> to deliver a requirement, and have been told by the client that the
> suggested encrypted filesystem etc is not good enough.
>
>
>> I wouldn't go with a Codec, personally, to achieve encryption. It's over
>> complicated IMO. Rather an encrypted Directory is a simpler solution. You
>> will need to implement an EncryptingIndexOutput and a matching
>> DecryptingIndexInput, but that's more or less it. The encryption/decryption
>> happens in buffers, so you will want to extend the respective BufferedIO
>> classes. The issues mentioned above should give you a head start, even
>> though the patches are old and likely don't compile against new versions,
>> but they contain the gist of it.
>>
>
> Thanks I will take a look. At the moment I am predominantly just trying to
> understand if it is even possible, it is unlikely the client will sign off
> any real development work on this until the New Year; If they sign-off,
> expect some more questions to the list from me :-p
>
>
>> Just make sure your application, or actually the process running Lucene,
>> receive the public/private key in a non obvious way, so that if someone
>> does get a hold of the machine, he can't obtain that information!
>>
>  Ok of course I will try and protect my app and paths to and from.
> However, I assume that if someone gets root access to the server, they can
> just dump the server's RAM to a disk file and have access to all the keys
> that happen to be in RAM anyway and that I can't really protect against
> that.
>
>> Also, as for encrypting the terms themselves, beyond the problems
>> mentioned above about wildcard queries, there is the risk of someone
>> guessing the terms based on their statistics. If the attacker knows the
>> corpus domain, I assume it shouldn't be hard for him to guess that a
>> certain word with a high DF and TF is probably "the" and proceed from there.
>>
>
> Based on the fact that my client doesn't seem to understand that this is
> probably not a good idea. I think the fact that someone might use
> statistical analysis to guess and potentially decrypt the index will be of
> little worry to them (even if I explain it).
>
>
>> Again, I'm no security expert and I've learned it's sometimes futile
>> trying to argue with them. If you can convince them though that the system
>> as a whole is protected enough, and if breached an encrypted index is
>> likely already breached too, you can avoid the complexity. From my
>> experience, encryption hurts performance, but you can improve that by eg
>> buffering parts unencrypted, but then you also need to prove your program's
>> memory is protected...
>>
> Mainly understood, but can you elaborate on "prove your program's memory
> is protected"?
>
>
> Thanks
>
> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
>

Reply via email to