Is your device patient data for a single patient or is it for potentially a large number of patients (on a single device)?

Generally, the proper way to "secure" a Lucene/Solr server and its data is to keep the server and data behind a firewall and also behind an application layer so that no outside process can directly talk to Solr. If you are developing a custom server based only on Lucene, then it is up to you to assure that your server is secure and Lucene has no role in that security.

Or... are you actually thinking of running Lucene on the device? If so, and security is a concern, then you are in UNCHARTED TERRITORY and really on your own.

In that case, you probably need to be talking about custom codecs for encrypted data for all of your fields. Even then, the in-memory representation of the field values would be unencrypted and hence insecure if someone took a stolen device and directly examined the memory.

How much "Rich Lucene Search" do you need to do on the device itself, as opposed to just looking for "(encrypted) blob storage"? If you want Lucene to do your searching on field values, the field values will be exposed in memory. If all you want is to retrieve an encrypted blob based on an encrypted key, why are you even considering Lucene?

-- Jack Krupansky

-----Original Message----- From: Rafaela Voiculescu
Sent: Wednesday, June 26, 2013 5:06 AM
To: java-user@lucene.apache.org
Subject: Re: Securing stored data using Lucene

Hello,

Thank you all for your help and the suggestions. They are very useful. I
wanted to clarify more aspects of the project, since I overlooked them in
my previous mails.

To explain the use case exactly, the application should work like this:

  - The application works with patient data and that's why we want to keep
  things confidential. We are downloading patient data that can go to the
  mobile device (it should even work on desktop in a similar way really)
  - We have to keep the data in the device due to internet connection
  limitation. The device will get, if lucky, internet connection once or
  twice per week, hence us needing to keep the patient data locally
  - The thing I forgot to mention is that the structure of the patient
  data is kept in json format
  - Currently, there is no plan for using database because the structure
of the patient stored locally might need to change (so we want to store the
  json as document in Lucene).
  - And we need to achieve the part with not having someone who, for
  instance steals the device, able to access the data unless they have the
  encryption key and mechanism and not having someone who's not supposed to
  access the data do that.

This is why we're trying to find a way to encrypt somehow the json
documents and still use Lucene or try not to have the index stored as
plaintext, if it would be possible.

Thank you again for all your help and in case this mail has given more
useful details and there are other suggestions or comments, I would be very
happy to read them.

Have a nice day,
Rafaela


On 25 June 2013 20:59, SUJIT PAL <sujit....@comcast.net> wrote:

Hi Rafaela,

I built something along these lines as a proof of concept. All data in the
index was unstored and only fields which were searchable (tokenized and
indexed) were kept in the index. The full record was encrypted and stored
in a MongoDB database. A custom Solr component did the search against the
index, gathered up unique ids of the results, then pulled out the encrypted
data from MongoDB, decrypted it on the fly and rendered the results.

You can find the (Scala) code here:
https://github.com/sujitpal/solr4-extras
(under the src/main/scala/com/mycompany/solr4extras/secure folder).

More information (more or less the same as what I wrote but probably a bit
more readable with inlined code):

http://sujitpal.blogspot.com/2012/12/searching-encrypted-document-collection.html

There are some obvious data sync concerns with this sort of setup, but as
Adrian points out, you can't index encrypted data.

HTH
Sujit

On Jun 25, 2013, at 4:17 AM, Adrien Grand wrote:

> On Tue, Jun 25, 2013 at 1:03 PM, Rafaela Voiculescu
> <rafaela.voicule...@gmail.com> wrote:
>> Hello,
>
> Hi,
>
>> I am sorry I was not a bit more explicit. I am trying to find an
acceptable
>> way to encrypt the data to prevent any access of it in any way unless
the
>> person who is trying to access it knows how to decrypt it. As I
mentioned,
>> I looked a bit through the patch, but I am not sure of its status.
>
> You can encrypt stored fields, but there is no way to do it correctly
> with fields that have positions indexed: attackers could infer the
> actual terms based on the order of terms (the encrypted version must
> sort the same way as the original terms), frequencies and positions.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to