I'm playing around with an application that requires me to manage a large (multi-gigabyte to terabyte), bespoke, frequently-updating data structure in real-time... key concerns are for durability and efficiency. While a traditional approach might be to employ an expensive DBMS on expensive hardware... I'm looking to be more innovative. I want to achieve big-iron beating performance on a shoestring budget... and I'm optimistic since the problem domain doesn't translate well to traditional RDBMS approaches.

An obvious alternative to a DBMS is to use the file-system directly... in principle this could work - but it would be a laborious process fraught with potential pitfalls with respect to atomicity of updates, transactional recovery (in case of a fail-stop while processing a large update) etc. Another issue is that in order to establish an efficient and reliable implementation, it becomes necessary to second guess details about the implementation of file-systems... this vastly complicates any implementation and might render it unacceptably fragile (subject to unexpected deviations in behaviour as the implementation is moved between hardware/OS-versions etc.

I've recently discovered that SSDs are becoming more affordable... and this might present new options. There were major hurdles in attempting to establish a strategy to interact with hard-disk block devices... including, but not limited to, a significant difficulty in establishing the extent to which locality of reference affected performance. Another worry was that it might be difficult to establish that a write had actually completed (i.e. the data reliably and durably stored - not just that the responsibility for recording the data was now exclusively with the drive.) My hope is that SSD technology simplifies some of these concerns - allowing a clear model for access performance that should allow an efficient and reliable implementation.

I'd like to hear about anyone who has experience with configuring SSDs for use with (Gentoo) Linux - and especially from anyone who's investigated performance issues. I've read that SSDs typically have a 64Kib block size... this would work fine for me (though I understand that it is a significant impediment for high performance with existing file systems. I'd be interested to know if anyone has done performance analysis of SSDs at the device level under Linux... and am intrigued if there is more to interacting with them than establishing the block size from manufacturer data - then reading/writing appropriately many bytes from block devices... and/or flushing appropriately aligned and sized blocks of memory mapped data. For example, is there an interface to quiz an SSD about its block-size? I'm intrigued to establish if I can rely upon my data being durably stored on an SSD when a flush/write returns.

In a practical sense, I'd like to experiment with some SSD hardware - but there seems to be a lot to chose from. For development purposes, I'd not need more than, say, 32GB - and I'm not all that fussed about absolute performance - as long as the relative performance of various interactions will increase proportionally were I to move to more expensive SSDs in future. I'm interested to establish any practical anecdotes (or hard statistical data) about the relative merits of various interfaces for SSDs - and to establish if RAID needs to be taken into account when establishing a performance model.

Any feedback would be appreciated... especially from any gentooist who is interested in SSD performance/reliability/configuration.



Reply via email to