Re: Derby speedup by not using page caches?

Knut Anders Hatlen Mon, 02 Mar 2009 03:16:59 -0800

Kurt Huwig <[email protected]> writes:

> Hi,
>
> I just stumbled upon this page
>
> http://varnish.projects.linpro.no/wiki/ArchitectNotes
>
> where the author writes that caching of disk files within the
> application hurts performance on current operating systems. Right now,
> the data is in memory at least twice: the OS cache and the Derby cache
> which sounds suboptimal. And it gets worse if the OS decides to swap
> out the application's cache.
>
> With the widespread use of 64-bit machines and therefore a huge
> address space, is it possible to (optionally) disable the page cache
> and use memory mapped files instead?


Hi Kurt,

No, there's no way to do that currently. There is an old JIRA issue for
using memory mapped files (DERBY-262), but there hasn't been much
activity yet.

For databases, the problem with double caching has traditionally been
solved by disabling the cache for the file system on which the database
is stored (that is, enabling direct I/O on that file system). That is
however a solution that fits better for a dedicated database server than
for a (zero-admin) database embedded in a desktop application.

I think there is more to it than just disabling the page cache and using
memory mapped files. Some of the challenges I see are:

- Derby's page cache is an object cache, and the objects also contain
  deserialized meta-data, not just the raw page data. If we don't have a
  page cache of some kind, the meta-data must be deserialized on each
  page access, which can slow down the database rather than speed it up.

- Derby computes and verifies the checksum on a page each time it is
  fetched into the cache. To get the same level of corruption detection
  without a page cache, we'd probably have to calculate the checksum on
  each page access. (Though for performance reasons one might prefer to
  have a way of disabling checksum verification in the database and rely
  on the corruption detection that many modern file systems have built
  in. But the file system's checks wouldn't detect corruption happening
  on the application level.)

- There's a lot of code in Derby's store implementation that works on a
  byte array that would need to be changed to work on a ByteBuffer if we
  use memory mapped files. This is further complicated by the fact that
  Derby supports some constrained environments where the FileChannel and
  ByteBuffer classes are not available, so we'd need to have either two
  separate store implementations or rewrite large portions of it to
  accept two different ways of accessing the raw data.

- To ensure atomicity, Derby must be able to control when a page is
  written to disk. It cannot be flushed from the cache to the disk until
  the undo log up to the last log record that touched the page has been
  forced to disk. So to prevent more frequent flushing of transaction
  log and the possibility to reflect writes on disk that we cannot undo
  in the case of a crash, I think we would still need some kind of
  buffering of write operations.

My guess is that the best way to exploit memory mapped files in Derby is
to keep the page cache and use memory mapped files from within it to
prevent double-buffering and reduce the memory footprint. But I'm afraid
it'll be a fairly large project.

-- 
Knut Anders

Re: Derby speedup by not using page caches?

Reply via email to