Re: [freenet-dev] Disk I/O thread

postwall-free...@yahoo.de Thu, 06 Sep 2012 16:44:11 -0700

>Freetalk and WoT may be better designed on that level. However they 
produce more disk I/O. And the reason for this is, mainstream database 
practice and theory are designed for two cases:
>1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
>hardware with professional sysadmins.
>2. Tiny amounts of data on commodity hardware.

>If
 you have lots of data on commodity disks, it falls down. And note that 
mainstream use of databases in the second case frequently fails. For 
example, I have a huge set of bookmarks in a tree on one of my browsers.
 Every time I create a new category, another one gets renamed.

Thats not a database error, thats an application one for sure. The reason 
databases fail on standard consumer hardware is because these systems often 
write buffer on various levels, undermining the whole ACID architecture. This 
leads to half-applied transactions or simply database corruption on power 
failure or even just os crashes. The results are random, though the standard 
one being a corrupt db which just doesn't get back up, any replicable behaviour 
isn't likely to be related to this. 

Apart from that databases of any size will run reliably on any hardware as long 
as you ensure fsync does what it is supposed to do and don't intentionally or 
unintentionally disable any of the transaction-based safety features. 
Performance will suffer on consumer hardware, greatly so if the indices are too 
big to be cached, but identical stuff runs slower on slower hardware, no 
surprise here. It doesn't mean that there is no point in running large DBs on 
consumer hardware or that it is somehow inherently unreliable and you end up 
with randomly modified data sets.

>> The approach of fred to not use it is just wrong from a computer-science 
>> perspective.
>> 
>> The fact that it does not perform so well is an implementation issue of the 
>> database, not of the client code which uses the database.
>
>No it is not. The load you put on it requires many seeks.

Are you sure of that?  The primary database will require seeking on pretty much 
any access due to its nature, alright, but it doesn't need to be accessed all 
that often actually. How many requests must a busy node handle per second? 
Maybe 10, if at all and those will be lucky to cause 10 ios. Your standard sata 
disk can deal with more then that, the standard figures are usually 50 - 100 
iops. Write requests to the DBs, caused by insert requests or fetched unknown 
data, which will require multiple seeks, will drive the load up a bit but if 
the "write per second to db" figures of my node are any indication then this 
won't cause severe disk load either, it just doesn't occure often enough.

This is also my general experience with freenet, the main DBs aren't the issue: 
Just keep it running on its own and there won't be any issues, even with 500+ 
GB stores on old 5400 rpm sata disks. Load the node with local requests, be it 
downloads or something else, and the whole things gets severly disk-limited 
very very fast. With stuff like WoT, Freetalk or the Spider it is even worse, 
these can easily get things to unbearable levels in my experience.

So imho the offender is rather the db4o database which is basically used by 
clients. The load here is different though, I don't really see how it must 
strictly be seek-heavy for long durations of time. Splitfile handling for 
example is more of a bulk data thing. Ideally it should get the data out of the 
db as needed during a decode, keeping temporary results in memory or in 
tempfiles if needed and only store the final result back to the db, minimising 
writes. This is still the equivalent of reading a terribly fragmented file from 
disk but thats not an unsurmountable task for a standard disk. But considering 
how io limited splitfile decoding often is on standard disks, and the time it 
takes, I would really suspect that it loads the DB with uneeded stuff, trying 
to minimise memory footprint, temp-file usage or something like that. 
Espescially since just the data retrieval, before the request actually 
completes, is already often io heavy in my experience.

The same goes for Freetalk & Co: They can't really cause many transactions per 
second, since freenet just can't fetch data fast enough to cause enough 
changes, so either they run really complex queries on really large or complex 
data sets or the db is inefficiently constructed or inefficiently accessed. No 
idea but it certainly seems weird.

I suspect in general that the db4o database is loaded with way too many writes 
per second which is what will kill DB performance fast. Even your average 
professionally run DB, which you talk about above, quite often runs on a raid 5 
or even raid 6 array out of 10k rpm disks, simply because 
thats a cost effective way to store bulky stuff, and cost is saved everywhere 
if at all possible. Those arrays will 
happily deal with very high read iops compared to your standard consumer disk 
but won't be happy at all about lots of 
random writes either, although big battery-backed write-buffers help to some 
degree. So I am not too sure, that this is just an issue of dealing with 
consumer hardware or the inherent type of load freenet needs to deal with and 
not actually just inefficient use of the DB paired with an inefficient dbm for 
this use-case to begin with.

Apart from the whole performance thing: 

I doubt that Fred will get away from rollbacks completly just by never manually 
triggering one, at least a  node crash and any query-errors should cause an 
automatic rollback to the state of the last commit. Does fred take this into 
account, meaning that it only commits when one logical "transaction" is done? 
If fred just commits every x DB actions or if the trigger is time-based then 
this could cause logical data corruption in the form of orphaned entries, 
partially modified ones and so on whenever the node is killed hard in some form 
or the other, which could then explain some of the db4o corruption going on. 
This would basically be true transaction abuse.

Considering using rollbacks generally "good" or "the right way" though isn't 
really right in my opinion. Avoiding transactions is evil, avoiding rollbacks 
might be but isn't necessarilly so. Abusing rollbacks to implement standard 
application logic though (just always start inserting or updating stuff and if 
you notice half-way through, that you don't actually need to do so, rollback) 
is certainly evil, too, using up DB ressources and locking tables for no good 
reason. Best paired with long running transactions involving as many tables as 
possible to truely lock down the whole DB for anyone else as long as possible.

>Of course, it might require fewer seeks if it was using a well-designed 
SQL schema rather than trying to store objects. And I'm only talking 
about writes here; obviously if everything doesn't fit in memory you 
need to do seeks on read as well, and they can be very involved given 
db4o's lack of two-column indexes. However, because of the >constant 
fsync's, we still needs loads of seeks *even if the whole thing fits in 
the OS disk cache*!

I don't know db4o but single column indices shouldn't be a big problem as long 
as the result-set per index doesn't get too big. Now if you have some kind of 
boolean column somewhere in a big table then this will basically lead to a 
partial index-scan which is still doable, if the index is in memory, but will 
end up horribly if it is not. Has anyone checked how multiple column queries 
are executed by db4o? Some dbms have quite retarded query optimisers which run 
into table-scans for less then a suboptimal index situation, so this might be 
worth a shot ...

>However, the bottom line is if you have to commit every few seconds, you have 
>to fsync every few seconds. IMHO avoiding that problem, for 
instance by turning off fsync and making periodic backups, would 
dramatically improve performance.

Fsync is what keeps your data safe it isn't some unnecessary habit of the dbm 
which it does cause it likes to throw a tantrum or something.  Disabling it and 
trying to resort to backups is a good classic way to make one of the core parts 
of any dbm worth its name completly and utterly useless, to end up with messy 
corruption detection routines (no, the db isn't all fine just cause it still 
loads) and in the end with completly inconsistent data sets, broken backups and 
what not, have fun with that.

>As regards Freenet I am leaning towards a hand-coded on-disk structure. 
We don't use queries anyway, and mostly we don't need them; most of the 
data could be handled as a series of flat-files, and it would be far 
more robust than db4o, and likely faster too.

Usually one wants to store some kind of object and reference to it via a key. 
To store it in a flat file one needs to serialise it to a string, or rather to 
binary data to be exact. Your standard rdbm works with tables which can store 
any sort of binary data and provide access to it via a key, multiple ones even, 
and, using indices, do it ressource efficient, too. This must be fate ;)  
Seriously, a flat file hand-coded and optimised data storage may outperform a 
general purpose dbm in some situations, as is the case with all hand-crafted 
stuff compared to general purpose ones, IF the hand-crafted stuff is done well. 
But I very much doubt though, that you will ever get close to the reliability 
your typical dbm provides, surviving software crashes, hardware malfunctions 
and what not, as long as the circumstances allow it in any way to do so. 

Now trading reliability for more speed may be a valid thing sometimes, agreed, 
the same as trading architectural "cleanness" like for example normalisation in 
for more performance is a valid thing to do sometimes. But while nobody will 
really care if some key in their nodes store links to the wrong data and most 
won't even care if the entire store corrupts (as long as it automatically 
"repairs" by starting from scratch of course), people will scream murder if 
downloads vanish, uploads corrupt and so on, which is likely to happen with any 
hand-crafted storage solution, espescially on flakey hardware and in general at 
first. If a dbm worth its name doesn't manage to hold on to its data, anything 
hand-crafted won't either. 

Please don't go down that road, db4o may be slow, but at least it works 
somewhat reliably, although less so then one might want, but slow storage is 
still much better then unreliable storage for anything but data one doesn't 
really care about anyway. Switch to another not object-oriented dbm if thats 
needed but please don't go back to some hand-crafted stuff which will work in 
release x, randomly corrupt in release y and fill your disk in release z. If 
this release cycle also holds true for db4o btw, then thats a pretty horrible 
result for a dbm.

In terms of speeds it boils down to this anyway: If the db needs to cause 
seek-heavy IO to fullfill a query, then any hand-crafted storage will have to 
do the same, assuming the db and query aren't build inefficiently. If there is 
much to gain by using a hand-crafted storage, then there should be room for 
improvement in the use of the db, too. For example by not storing temporary 
results in the db, like temporary results during splitfile decoding, but just 
starting the calculation from scratch if the node really crashes.

Hm, this email got pretty long in the end. Sorry for that - end of rant.

________________________________
 Von: Matthew Toseland <t...@amphibian.dyndns.org>
An: Discussion of development issues <devl@freenetproject.org> 
Gesendet: 12:32 Dienstag, 4.September 2012
Betreff: Re: [freenet-dev] Disk I/O thread

On Sunday 02 Sep 2012 17:51:49 xor wrote:
> On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
> > Sadly Freetalk/WoT do use rollback so has to
> > commit EVERY TIME. 
> 
> I oppose to the "sadly". Transaction-based programming has proven to be a 
> valid approach to solve many issues of traditional "manually undo everything 
> upon error"-programming.

Lets get one thing clear to begin with: I am not advocating that Freetalk/WoT 
aggregate transactions by simply not committing. The basic design flaw in 
Fred's use of the database layer was to assume that object databases work as 
advertised and you can simply take a big complex in-memory structure and 
persist it more or less transparently, and then add a load of (de)activation 
code and reduce memory usage.

Freetalk and WoT may be better designed on that level. However they produce 
more disk I/O. And the reason for this is, mainstream database practice and 
theory are designed for two cases:
1. Lots of data (absolute amount and transactions/sec) on fast, reliable 
hardware with professional sysadmins.
2. Tiny amounts of data on commodity hardware.

If you have lots of data on commodity disks, it falls down. And note that 
mainstream use of databases in the second case frequently fails. For example, I 
have a huge set of bookmarks in a tree on one of my browsers. Every time I 
create a new category, another one gets renamed.

> The approach of fred to not use it is just wrong from a computer-science 
> perspective.
> 
> The fact that it does not perform so well is an implementation issue of the 
> database, not of the client code which uses the database.

No it is not. The load you put on it requires many seeks.

Of course, it might require fewer seeks if it was using a well-designed SQL 
schema rather than trying to store objects. And I'm only talking about writes 
here; obviously if everything doesn't fit in memory you need to do seeks on 
read as well, and they can be very involved given db4o's lack of two-column 
indexes. However, because of the constant fsync's, we still needs loads of 
seeks *even if the whole thing fits in the OS disk cache*!
> 
> Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong 
> algorithms, so we are not even at the point where we can tell whether their 
> ACID-usage is a problem. There is too much noise generated from the highly 
> inefficient algorithms which are being used by them, we cannot measure the 
> performance impact of commit/rollback.

That's possible.

However, the bottom line is if you have to commit every few seconds, you have 
to fsync every few seconds. IMHO avoiding that problem, for instance by turning 
off fsync and making periodic backups, would dramatically improve performance.

As regards Freenet I am leaning towards a hand-coded on-disk structure. We 
don't use queries anyway, and mostly we don't need them; most of the data could 
be handled as a series of flat-files, and it would be far more robust than 
db4o, and likely faster too. But the first thing to do is upgrade the database 
and see whether it 1) breaks even worse or 2) fixes the problems. Past testing 
suggests #1, but past testing was based on later versions of 7.4, not on the 
latest 7.12.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Disk I/O thread

Reply via email to