>Freetalk and WoT may be better designed on that level. However they
produce more disk I/O. And the reason for this is, mainstream database
practice and theory are designed for two cases:
>1. Lots of data (absolute amount and transactions/sec) on fast, reliable
>hardware with professional sysadmins.
>2. Tiny amounts of data on commodity hardware.
>If
you have lots of data on commodity disks, it falls down. And note that
mainstream use of databases in the second case frequently fails. For
example, I have a huge set of bookmarks in a tree on one of my browsers.
Every time I create a new category, another one gets renamed.
Thats not a database error, thats an application one for sure. The reason
databases fail on standard consumer hardware is because these systems often
write buffer on various levels, undermining the whole ACID architecture. This
leads to half-applied transactions or simply database corruption on power
failure or even just os crashes. The results are random, though the standard
one being a corrupt db which just doesn't get back up, any replicable behaviour
isn't likely to be related to this.
Apart from that databases of any size will run reliably on any hardware as long
as you ensure fsync does what it is supposed to do and don't intentionally or
unintentionally disable any of the transaction-based safety features.
Performance will suffer on consumer hardware, greatly so if the indices are too
big to be cached, but identical stuff runs slower on slower hardware, no
surprise here. It doesn't mean that there is no point in running large DBs on
consumer hardware or that it is somehow inherently unreliable and you end up
with randomly modified data sets.
>> The approach of fred to not use it is just wrong from a computer-science
>> perspective.
>>
>> The fact that it does not perform so well is an implementation issue of the
>> database, not of the client code which uses the database.
>
>No it is not. The load you put on it requires many seeks.
Are you sure of that? The primary database will require seeking on pretty much
any access due to its nature, alright, but it doesn't need to be accessed all
that often actually. How many requests must a busy node handle per second?
Maybe 10, if at all and those will be lucky to cause 10 ios. Your standard sata
disk can deal with more then that, the standard figures are usually 50 - 100
iops. Write requests to the DBs, caused by insert requests or fetched unknown
data, which will require multiple seeks, will drive the load up a bit but if
the "write per second to db" figures of my node are any indication then this
won't cause severe disk load either, it just doesn't occure often enough.
This is also my general experience with freenet, the main DBs aren't the issue:
Just keep it running on its own and there won't be any issues, even with 500+
GB stores on old 5400 rpm sata disks. Load the node with local requests, be it
downloads or something else, and the whole things gets severly disk-limited
very very fast. With stuff like WoT, Freetalk or the Spider it is even worse,
these can easily get things to unbearable levels in my experience.
So imho the offender is rather the db4o database which is basically used by
clients. The load here is different though, I don't really see how it must
strictly be seek-heavy for long durations of time. Splitfile handling for
example is more of a bulk data thing. Ideally it should get the data out of the
db as needed during a decode, keeping temporary results in memory or in
tempfiles if needed and only store the final result back to the db, minimising
writes. This is still the equivalent of reading a terribly fragmented file from
disk but thats not an unsurmountable task for a standard disk. But considering
how io limited splitfile decoding often is on standard disks, and the time it
takes, I would really suspect that it loads the DB with uneeded stuff, trying
to minimise memory footprint, temp-file usage or something like that.
Espescially since just the data retrieval, before the request actually
completes, is already often io heavy in my experience.
The same goes for Freetalk & Co: They can't really cause many transactions per
second, since freenet just can't fetch data fast enough to cause enough
changes, so either they run really complex queries on really large or complex
data sets or the db is inefficiently constructed or inefficiently accessed. No
idea but it certainly seems weird.
I suspect in general that the db4o database is loaded with way too many writes
per second which is what will kill DB performance fast. Even your average
professionally run DB, which you talk about above, quite often runs on a raid 5
or even raid 6 array out of 10k rpm disks, simply because
thats a cost effective way to store bulky stuff, and cost is saved everywhere
if at all possible. Those arrays will
happily deal with very high read iops compared to your standard consumer disk
but won't be happy at all about lots of
random writes either, although big battery-backed write-buffers help to some
degree. So I am not too sure, that this is just an issue of dealing with
consumer hardware or the inherent type of load freenet needs to deal with and
not actually just inefficient use of the DB paired with an inefficient dbm for
this use-case to begin with.
Apart from the whole performance thing:
I doubt that Fred will get away from rollbacks completly just by never manually
triggering one, at least a node crash and any query-errors should cause an
automatic rollback to the state of the last commit. Does fred take this into
account, meaning that it only commits when one logical "transaction" is done?
If fred just commits every x DB actions or if the trigger is time-based then
this could cause logical data corruption in the form of orphaned entries,
partially modified ones and so on whenever the node is killed hard in some form
or the other, which could then explain some of the db4o corruption going on.
This would basically be true transaction abuse.
Considering using rollbacks generally "good" or "the right way" though isn't
really right in my opinion. Avoiding transactions is evil, avoiding rollbacks
might be but isn't necessarilly so. Abusing rollbacks to implement standard
application logic though (just always start inserting or updating stuff and if
you notice half-way through, that you don't actually need to do so, rollback)
is certainly evil, too, using up DB ressources and locking tables for no good
reason. Best paired with long running transactions involving as many tables as
possible to truely lock down the whole DB for anyone else as long as possible.
>Of course, it might require fewer seeks if it was using a well-designed
SQL schema rather than trying to store objects. And I'm only talking
about writes here; obviously if everything doesn't fit in memory you
need to do seeks on read as well, and they can be very involved given
db4o's lack of two-column indexes. However, because of the >constant
fsync's, we still needs loads of seeks *even if the whole thing fits in
the OS disk cache*!
I don't know db4o but single column indices shouldn't be a big problem as long
as the result-set per index doesn't get too big. Now if you have some kind of
boolean column somewhere in a big table then this will basically lead to a
partial index-scan which is still doable, if the index is in memory, but will
end up horribly if it is not. Has anyone checked how multiple column queries
are executed by db4o? Some dbms have quite retarded query optimisers which run
into table-scans for less then a suboptimal index situation, so this might be
worth a shot ...
>However, the bottom line is if you have to commit every few seconds, you have
>to fsync every few seconds. IMHO avoiding that problem, for
instance by turning off fsync and making periodic backups, would
dramatically improve performance.
Fsync is what keeps your data safe it isn't some unnecessary habit of the dbm
which it does cause it likes to throw a tantrum or something. Disabling it and
trying to resort to backups is a good classic way to make one of the core parts
of any dbm worth its name completly and utterly useless, to end up with messy
corruption detection routines (no, the db isn't all fine just cause it still
loads) and in the end with completly inconsistent data sets, broken backups and
what not, have fun with that.
>As regards Freenet I am leaning towards a hand-coded on-disk structure.
We don't use queries anyway, and mostly we don't need them; most of the
data could be handled as a series of flat-files, and it would be far
more robust than db4o, and likely faster too.
Usually one wants to store some kind of object and reference to it via a key.
To store it in a flat file one needs to serialise it to a string, or rather to
binary data to be exact. Your standard rdbm works with tables which can store
any sort of binary data and provide access to it via a key, multiple ones even,
and, using indices, do it ressource efficient, too. This must be fate ;)
Seriously, a flat file hand-coded and optimised data storage may outperform a
general purpose dbm in some situations, as is the case with all hand-crafted
stuff compared to general purpose ones, IF the hand-crafted stuff is done well.
But I very much doubt though, that you will ever get close to the reliability
your typical dbm provides, surviving software crashes, hardware malfunctions
and what not, as long as the circumstances allow it in any way to do so.
Now trading reliability for more speed may be a valid thing sometimes, agreed,
the same as trading architectural "cleanness" like for example normalisation in
for more performance is a valid thing to do sometimes. But while nobody will
really care if some key in their nodes store links to the wrong data and most
won't even care if the entire store corrupts (as long as it automatically
"repairs" by starting from scratch of course), people will scream murder if
downloads vanish, uploads corrupt and so on, which is likely to happen with any
hand-crafted storage solution, espescially on flakey hardware and in general at
first. If a dbm worth its name doesn't manage to hold on to its data, anything
hand-crafted won't either.
Please don't go down that road, db4o may be slow, but at least it works
somewhat reliably, although less so then one might want, but slow storage is
still much better then unreliable storage for anything but data one doesn't
really care about anyway. Switch to another not object-oriented dbm if thats
needed but please don't go back to some hand-crafted stuff which will work in
release x, randomly corrupt in release y and fill your disk in release z. If
this release cycle also holds true for db4o btw, then thats a pretty horrible
result for a dbm.
In terms of speeds it boils down to this anyway: If the db needs to cause
seek-heavy IO to fullfill a query, then any hand-crafted storage will have to
do the same, assuming the db and query aren't build inefficiently. If there is
much to gain by using a hand-crafted storage, then there should be room for
improvement in the use of the db, too. For example by not storing temporary
results in the db, like temporary results during splitfile decoding, but just
starting the calculation from scratch if the node really crashes.
Hm, this email got pretty long in the end. Sorry for that - end of rant.
________________________________
Von: Matthew Toseland <t...@amphibian.dyndns.org>
An: Discussion of development issues <devl@freenetproject.org>
Gesendet: 12:32 Dienstag, 4.September 2012
Betreff: Re: [freenet-dev] Disk I/O thread
On Sunday 02 Sep 2012 17:51:49 xor wrote:
> On Thursday 30 August 2012 00:40:13 Matthew Toseland wrote:
> > Sadly Freetalk/WoT do use rollback so has to
> > commit EVERY TIME.
>
> I oppose to the "sadly". Transaction-based programming has proven to be a
> valid approach to solve many issues of traditional "manually undo everything
> upon error"-programming.
Lets get one thing clear to begin with: I am not advocating that Freetalk/WoT
aggregate transactions by simply not committing. The basic design flaw in
Fred's use of the database layer was to assume that object databases work as
advertised and you can simply take a big complex in-memory structure and
persist it more or less transparently, and then add a load of (de)activation
code and reduce memory usage.
Freetalk and WoT may be better designed on that level. However they produce
more disk I/O. And the reason for this is, mainstream database practice and
theory are designed for two cases:
1. Lots of data (absolute amount and transactions/sec) on fast, reliable
hardware with professional sysadmins.
2. Tiny amounts of data on commodity hardware.
If you have lots of data on commodity disks, it falls down. And note that
mainstream use of databases in the second case frequently fails. For example, I
have a huge set of bookmarks in a tree on one of my browsers. Every time I
create a new category, another one gets renamed.
> The approach of fred to not use it is just wrong from a computer-science
> perspective.
>
> The fact that it does not perform so well is an implementation issue of the
> database, not of the client code which uses the database.
No it is not. The load you put on it requires many seeks.
Of course, it might require fewer seeks if it was using a well-designed SQL
schema rather than trying to store objects. And I'm only talking about writes
here; obviously if everything doesn't fit in memory you need to do seeks on
read as well, and they can be very involved given db4o's lack of two-column
indexes. However, because of the constant fsync's, we still needs loads of
seeks *even if the whole thing fits in the OS disk cache*!
>
> Also, Freetalk/WoT themselves are CPU/IO hogs due to them using the wrong
> algorithms, so we are not even at the point where we can tell whether their
> ACID-usage is a problem. There is too much noise generated from the highly
> inefficient algorithms which are being used by them, we cannot measure the
> performance impact of commit/rollback.
That's possible.
However, the bottom line is if you have to commit every few seconds, you have
to fsync every few seconds. IMHO avoiding that problem, for instance by turning
off fsync and making periodic backups, would dramatically improve performance.
As regards Freenet I am leaning towards a hand-coded on-disk structure. We
don't use queries anyway, and mostly we don't need them; most of the data could
be handled as a series of flat-files, and it would be far more robust than
db4o, and likely faster too. But the first thing to do is upgrade the database
and see whether it 1) breaks even worse or 2) fixes the problems. Past testing
suggests #1, but past testing was based on later versions of 7.4, not on the
latest 7.12.
_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl