Btw, +1 to the initiative. I've heard of clients used encrypted HDFC for these usecases. Direct support at Lucene/Solr level is much better.
On Wed, 15 Mar, 2023, 2:52 pm Ishan Chattopadhyaya, < ichattopadhy...@gmail.com> wrote: > Does it need to be a first party project? > > On Wed, 15 Mar, 2023, 2:46 pm Bruno Roustant, <broust...@apache.org> > wrote: > >> Hi, >> >> I pushed a PR <https://github.com/apache/solr-sandbox/pull/51> in >> solr-sandbox <https://github.com/apache/solr-sandbox> to propose a >> Java-level encryption for Solr. >> This work is the follow up of LUCENE-9379 >> <https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-9379>. >> >> To give some details, here is the overview section of the ENCRYPTION.md >> < >> https://github.com/apache/solr-sandbox/blob/e422e3dd4febab54ba9a8d965189b38217552b46/ENCRYPTION.md >> > >> file in this PR: >> >> This solution provides the encryption of the Lucene index files at the >> Java >> level. >> It encrypts all (or some) the files in a given index with a provided >> encryption key. >> It stores the id of the encryption key in the commit metadata (and >> obviously the >> key secret is never stored). It is possible to define a different key per >> Solr Core. >> This module also provides an EncryptionRequestHandler so that a client can >> trigger >> the (re)encryption of a Solr Core index. The (re)encryption is done >> concurrently >> while the Solr Core can continue to serve update and query requests. >> >> Comparing with an OS-level encryption: >> >> - OS-level encryption [1][2] is more performant and more adapted to let >> Lucene >> leverage the OS memory cache. It can manage encryption at block or >> filesystem >> level in the OS. This makes it possible to encrypt with different keys >> per-directory, >> making multi-tenant use-cases possible. >> If you can use OS-level encryption, prefer it and skip this Java-level >> encryption. >> >> - Java-level encryption can be used when the OS-level encryption >> management >> is >> not possible (e.g. host machine managed by a cloud provider). It has an >> impact >> on performance: expect -20% on most queries, -60% on multi-term queries. >> >> [1] https://wiki.archlinux.org/title/Fscrypt >> [2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html >> >> - Bruno >> >