Is your device patient data for a single patient or is it for potentially a
large number of patients (on a single device)?
Generally, the proper way to "secure" a Lucene/Solr server and its data is
to keep the server and data behind a firewall and also behind an application
layer so that no outside process can directly talk to Solr. If you are
developing a custom server based only on Lucene, then it is up to you to
assure that your server is secure and Lucene has no role in that security.
Or... are you actually thinking of running Lucene on the device? If so, and
security is a concern, then you are in UNCHARTED TERRITORY and really on
your own.
In that case, you probably need to be talking about custom codecs for
encrypted data for all of your fields. Even then, the in-memory
representation of the field values would be unencrypted and hence insecure
if someone took a stolen device and directly examined the memory.
How much "Rich Lucene Search" do you need to do on the device itself, as
opposed to just looking for "(encrypted) blob storage"? If you want Lucene
to do your searching on field values, the field values will be exposed in
memory. If all you want is to retrieve an encrypted blob based on an
encrypted key, why are you even considering Lucene?
-- Jack Krupansky
-----Original Message-----
From: Rafaela Voiculescu
Sent: Wednesday, June 26, 2013 5:06 AM
To: java-user@lucene.apache.org
Subject: Re: Securing stored data using Lucene
Hello,
Thank you all for your help and the suggestions. They are very useful. I
wanted to clarify more aspects of the project, since I overlooked them in
my previous mails.
To explain the use case exactly, the application should work like this:
- The application works with patient data and that's why we want to keep
things confidential. We are downloading patient data that can go to the
mobile device (it should even work on desktop in a similar way really)
- We have to keep the data in the device due to internet connection
limitation. The device will get, if lucky, internet connection once or
twice per week, hence us needing to keep the patient data locally
- The thing I forgot to mention is that the structure of the patient
data is kept in json format
- Currently, there is no plan for using database because the structure
of the patient stored locally might need to change (so we want to store
the
json as document in Lucene).
- And we need to achieve the part with not having someone who, for
instance steals the device, able to access the data unless they have the
encryption key and mechanism and not having someone who's not supposed to
access the data do that.
This is why we're trying to find a way to encrypt somehow the json
documents and still use Lucene or try not to have the index stored as
plaintext, if it would be possible.
Thank you again for all your help and in case this mail has given more
useful details and there are other suggestions or comments, I would be very
happy to read them.
Have a nice day,
Rafaela
On 25 June 2013 20:59, SUJIT PAL <sujit....@comcast.net> wrote:
Hi Rafaela,
I built something along these lines as a proof of concept. All data in the
index was unstored and only fields which were searchable (tokenized and
indexed) were kept in the index. The full record was encrypted and stored
in a MongoDB database. A custom Solr component did the search against the
index, gathered up unique ids of the results, then pulled out the
encrypted
data from MongoDB, decrypted it on the fly and rendered the results.
You can find the (Scala) code here:
https://github.com/sujitpal/solr4-extras
(under the src/main/scala/com/mycompany/solr4extras/secure folder).
More information (more or less the same as what I wrote but probably a bit
more readable with inlined code):
http://sujitpal.blogspot.com/2012/12/searching-encrypted-document-collection.html
There are some obvious data sync concerns with this sort of setup, but as
Adrian points out, you can't index encrypted data.
HTH
Sujit
On Jun 25, 2013, at 4:17 AM, Adrien Grand wrote:
> On Tue, Jun 25, 2013 at 1:03 PM, Rafaela Voiculescu
> <rafaela.voicule...@gmail.com> wrote:
>> Hello,
>
> Hi,
>
>> I am sorry I was not a bit more explicit. I am trying to find an
acceptable
>> way to encrypt the data to prevent any access of it in any way unless
the
>> person who is trying to access it knows how to decrypt it. As I
mentioned,
>> I looked a bit through the patch, but I am not sure of its status.
>
> You can encrypt stored fields, but there is no way to do it correctly
> with fields that have positions indexed: attackers could infer the
> actual terms based on the order of terms (the encrypted version must
> sort the same way as the original terms), frequencies and positions.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org