I've also observed that HDFS supports client provided encryption... or so I recall when I looked many months ago. Someone ought to do a blog/write-up on that.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, Mar 15, 2023 at 5:28 AM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Btw, +1 to the initiative. I've heard of clients used encrypted HDFC for > these usecases. Direct support at Lucene/Solr level is much better. > > On Wed, 15 Mar, 2023, 2:52 pm Ishan Chattopadhyaya, < > ichattopadhy...@gmail.com> wrote: > > > Does it need to be a first party project? > > > > On Wed, 15 Mar, 2023, 2:46 pm Bruno Roustant, <broust...@apache.org> > > wrote: > > > >> Hi, > >> > >> I pushed a PR <https://github.com/apache/solr-sandbox/pull/51> in > >> solr-sandbox <https://github.com/apache/solr-sandbox> to propose a > >> Java-level encryption for Solr. > >> This work is the follow up of LUCENE-9379 > >> <https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-9379>. > >> > >> To give some details, here is the overview section of the ENCRYPTION.md > >> < > >> > https://github.com/apache/solr-sandbox/blob/e422e3dd4febab54ba9a8d965189b38217552b46/ENCRYPTION.md > >> > > >> file in this PR: > >> > >> This solution provides the encryption of the Lucene index files at the > >> Java > >> level. > >> It encrypts all (or some) the files in a given index with a provided > >> encryption key. > >> It stores the id of the encryption key in the commit metadata (and > >> obviously the > >> key secret is never stored). It is possible to define a different key > per > >> Solr Core. > >> This module also provides an EncryptionRequestHandler so that a client > can > >> trigger > >> the (re)encryption of a Solr Core index. The (re)encryption is done > >> concurrently > >> while the Solr Core can continue to serve update and query requests. > >> > >> Comparing with an OS-level encryption: > >> > >> - OS-level encryption [1][2] is more performant and more adapted to let > >> Lucene > >> leverage the OS memory cache. It can manage encryption at block or > >> filesystem > >> level in the OS. This makes it possible to encrypt with different keys > >> per-directory, > >> making multi-tenant use-cases possible. > >> If you can use OS-level encryption, prefer it and skip this Java-level > >> encryption. > >> > >> - Java-level encryption can be used when the OS-level encryption > >> management > >> is > >> not possible (e.g. host machine managed by a cloud provider). It has an > >> impact > >> on performance: expect -20% on most queries, -60% on multi-term queries. > >> > >> [1] https://wiki.archlinux.org/title/Fscrypt > >> [2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html > >> > >> - Bruno > >> > > >