Btw, +1 to the initiative. I've heard of clients used encrypted HDFC for
these usecases. Direct support at Lucene/Solr level is much better.

On Wed, 15 Mar, 2023, 2:52 pm Ishan Chattopadhyaya, <
ichattopadhy...@gmail.com> wrote:

> Does it need to be a first party project?
>
> On Wed, 15 Mar, 2023, 2:46 pm Bruno Roustant, <broust...@apache.org>
> wrote:
>
>> Hi,
>>
>> I pushed a PR <https://github.com/apache/solr-sandbox/pull/51> in
>> solr-sandbox <https://github.com/apache/solr-sandbox> to propose a
>> Java-level encryption for Solr.
>> This work is the follow up of LUCENE-9379
>> <https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-9379>.
>>
>> To give some details, here is the overview section of the ENCRYPTION.md
>> <
>> https://github.com/apache/solr-sandbox/blob/e422e3dd4febab54ba9a8d965189b38217552b46/ENCRYPTION.md
>> >
>> file in this PR:
>>
>> This solution provides the encryption of the Lucene index files at the
>> Java
>> level.
>> It encrypts all (or some) the files in a given index with a provided
>> encryption key.
>> It stores the id of the encryption key in the commit metadata (and
>> obviously the
>> key secret is never stored). It is possible to define a different key per
>> Solr Core.
>> This module also provides an EncryptionRequestHandler so that a client can
>> trigger
>> the (re)encryption of a Solr Core index. The (re)encryption is done
>> concurrently
>> while the Solr Core can continue to serve update and query requests.
>>
>> Comparing with an OS-level encryption:
>>
>> - OS-level encryption [1][2] is more performant and more adapted to let
>> Lucene
>> leverage the OS memory cache. It can manage encryption at block or
>> filesystem
>> level in the OS. This makes it possible to encrypt with different keys
>> per-directory,
>> making multi-tenant use-cases possible.
>> If you can use OS-level encryption, prefer it and skip this Java-level
>> encryption.
>>
>> - Java-level encryption can be used when the OS-level encryption
>> management
>> is
>> not possible (e.g. host machine managed by a cloud provider). It has an
>> impact
>> on performance: expect -20% on most queries, -60% on multi-term queries.
>>
>> [1] https://wiki.archlinux.org/title/Fscrypt
>> [2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html
>>
>> - Bruno
>>
>

Reply via email to