[freenet-dev] Disk I/O thread

Matthew Toseland Wed, 29 Aug 2012 23:40:13 +0100

Response to a long thread on FMS about how to reduce Freenet's disk I/O, what 
are realistic system requirements, when can we expect to see SSDs taking over, 
and will Freenet kill commodity disks as a matter of routine.


================================================================================

I'm just going to reply to everyone here.

First, hard disk writes are not the only limited resource: As has been pointed 
out, RAM is limited too. So we can't assume that node.db4o will fit in RAM 
(much less persistent-blob.tmp, which is intimately related to node.db4o). But 
we should make good use of this happy situation when it occurs.

Second, if there is plenty of RAM, the OS will cache the file. So reads aren't 
a problem - they just create more CPU usage. Writes are the big problem: How do 
we reduce the number of database writes? As far as I know, if the node is 
"idle", in the sense that the requests are failing, we do no database writes at 
all. However, there may be some maintenance.

One big question is, is a short burst of writes every so often preferable to 
writes every second? Possible benefits:
- It's closer to what the hard disks expect so hopefully will have less impact 
on hard disk lifespans.
- The seeks can be quite small, so it should be fast-ish.

Possible drawbacks are that since it is more intense it might have a bigger 
negative impact on the rest of the system (for a short time). Which might be 
bad for e.g. online gaming, although we will want a gamer mode or something 
eventually (ideally with platform specific autodetection helpers).

Considering the datastore alone, it is perfectly feasible, and safe, to 
aggregate writes in memory, provided there is sufficient memory. (Based on the 
overall heap limit, which in turn is based on the detected amount of memory). 
One complication is if the data doesn't hit the main datastore it should still 
be in the ULPR/slashdot cache, so we'd need to allow that to access the 
in-memory blocks where appropriate.

Now, regarding the database (node.db4o):
- It is hard to make uploads not use lots of database queries without 
substantial changes. I may look into it but expect it to be difficult.
- Accepting limited potential data loss is not, at present, an option. The 
database is more likely to completely die than just lose some changes. This is 
why we fsync on commit, and commit frequently. Since we abuse the nominally 
ACID nature of the database (we never rollback), we can (and do) commit only 
when something important happens or periodically, but there is still a lot of 
traffic. Sadly Freetalk/WoT do use rollback so has to commit EVERY TIME.
- Periodic backups (synchronized with the persistent-blob file) could avoid the 
need for fsync. This would greatly reduce the actual disk writes by allowing 
the operating system to optimise them properly.
- In theory we could do more aggressive caching once we have this 
infrastructure, up to and including keeping the whole thing in RAM and writing 
it periodically. We would need to smoothly handle it growing so it doesn't fit.
- The actual blocks are just big linear writes, so it's much more efficient to 
buffer database writes than to buffer unwritten blocks. If we have a lot of RAM 
it may make sense to do both. Which would further complicate the above.
- Database jobs can be very slow especially if RAM is limited (meaning we have 
to do lots of reads because the OS isn't caching the whole file). Things like 
unpacking the next layer of a splitfile can be hideously slow. We can't 
necessarily aggregate commits, at least not at the job level. On the other 
hand, we DO aggregate commits at the job level to some degree, in the sense 
that while a big job such as above is running, the new blocks coming in are 
queued; eventually we stop fetching new blocks. IIRC mostly they are written to 
disk to save memory. :|

A lot of the above depends on an awful lot of RAM being available. Possibly we 
should tweak the autodetection. Certainly we will affect system performance by 
using too much RAM, just as we do with too many disk writes.

Unfortunately there are other places we write frequently such as the peers 
files too. These need debugging.

So, what of the above is not already on the bug tracker?

1. Do we want to aggregate writes to the datastore and write them periodically? 
(Implementation issues mentioned above)

2. Caching of blocks for persistent-blob.tmp, as well as of the database 
itself, if we have lots of RAM, after implementing auto-backups.

3. Can we give Freenet any more RAM? The current allocation (the wrapper memory 
limit, which does not include things like thread stacks) is:
<512MB -> 128M
<1GB -> 192M
<2G -> 256M
Else 512M
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20120829/bdf63ad3/attachment.pgp>

[freenet-dev] Disk I/O thread

Reply via email to