I would like to store Lucene indexes in an encrypted format. The only
security requirement is that if an intruder copies files from the file
system, no file will have raw data. It is acceptable for raw data to be
visible in raw disk scans. All I want to do is encrypt the readable index
files.

Here is one way to encrypt Lucene indexes: encrypt the entire file on disk
and store the decrypted version in memory. This is ok with a RAMdirectory,
but does not scale. Using a little-known feature of Posix, it is possible
to create a memory-mapped file with a raw copy of the data which cannot be
found from the file system. The Posix feature is that when you open a file
and then delete it, the file still exists in the file system but is not
visible through the file system. The data exists as an invisible file in
the file system, and the file is deleted when you close the file
descriptor. (This does not work on Windows.) Let's call this a 'ghost
file'.

If memory-mapping works with ghost files, this seems like it should work: a
new Directory class will create a file and immediately delete it, then
memory-map it. The memory-mapped file will stay allocated inside the JVM
until the JVM closes the associated Directory object. The Directory class
would create an entire 'ghost Lucene index'.

This sequence opens an index:
* open encrypted segment file in memory-mapped format
* create ghost memory-mapped file
* decrypt from encrypted memory into ghost file memory
* close the encrypted index file
Directory.close() wipes the ghost file data, closes the ghost file,  and
the file system reclaims the disk space.

This sequence creates an index:
Directory.createOutput makes a ghost file and a real file.
All data is saved to the ghost file.
Close on the file encrypts the ghost file data into the real file, and
wipes the ghost data.
Both files are then closed.

One glaring flaw is: what if close() is not called? The raw data will still
exist in the free disk space.
There are two cases where this would happen:
1) the user fails to call close() but the program finishes normally. This
can be countered by adding a finalize() method that makes sure to clear the
memory.
2) the JVM fails and shutdown code is not run. The freed ghost data is on
the hard disk in the free disk space. It can only be found by scanning the
raw disks. One counter to this is to run the app in a virtual machine which
does not have access to the raw disk drivers.

Is this a workable design? Are there any quirks of the Directory
abstraction that make this impossible or pointless? Or quirks in
memory-mapped files or how the JVM implements them?

Thanks for your time,

Lance Norskog






-- 
Lance Norskog
goks...@gmail.com

Reply via email to