Kurt Huwig <[email protected]> writes: > Hi, > > I just stumbled upon this page > > http://varnish.projects.linpro.no/wiki/ArchitectNotes > > where the author writes that caching of disk files within the > application hurts performance on current operating systems. Right now, > the data is in memory at least twice: the OS cache and the Derby cache > which sounds suboptimal. And it gets worse if the OS decides to swap > out the application's cache. > > With the widespread use of 64-bit machines and therefore a huge > address space, is it possible to (optionally) disable the page cache > and use memory mapped files instead?
Hi Kurt, No, there's no way to do that currently. There is an old JIRA issue for using memory mapped files (DERBY-262), but there hasn't been much activity yet. For databases, the problem with double caching has traditionally been solved by disabling the cache for the file system on which the database is stored (that is, enabling direct I/O on that file system). That is however a solution that fits better for a dedicated database server than for a (zero-admin) database embedded in a desktop application. I think there is more to it than just disabling the page cache and using memory mapped files. Some of the challenges I see are: - Derby's page cache is an object cache, and the objects also contain deserialized meta-data, not just the raw page data. If we don't have a page cache of some kind, the meta-data must be deserialized on each page access, which can slow down the database rather than speed it up. - Derby computes and verifies the checksum on a page each time it is fetched into the cache. To get the same level of corruption detection without a page cache, we'd probably have to calculate the checksum on each page access. (Though for performance reasons one might prefer to have a way of disabling checksum verification in the database and rely on the corruption detection that many modern file systems have built in. But the file system's checks wouldn't detect corruption happening on the application level.) - There's a lot of code in Derby's store implementation that works on a byte array that would need to be changed to work on a ByteBuffer if we use memory mapped files. This is further complicated by the fact that Derby supports some constrained environments where the FileChannel and ByteBuffer classes are not available, so we'd need to have either two separate store implementations or rewrite large portions of it to accept two different ways of accessing the raw data. - To ensure atomicity, Derby must be able to control when a page is written to disk. It cannot be flushed from the cache to the disk until the undo log up to the last log record that touched the page has been forced to disk. So to prevent more frequent flushing of transaction log and the possibility to reflect writes on disk that we cannot undo in the case of a crash, I think we would still need some kind of buffering of write operations. My guess is that the best way to exploit memory mapped files in Derby is to keep the page cache and use memory mapped files from within it to prevent double-buffering and reduce the memory footprint. But I'm afraid it'll be a fairly large project. -- Knut Anders
