[freenet-dev] data store bug

Greg Wooledge Wed, 4 Sep 2002 09:32:06 -0400

So, the CofE/TFE/FF author wants details on the data store bug.  Well,
I don't use Frost, so I'll write it out here in the open where normal
people can read it.

There has been some speculation that Frost causes the DSB. I can't
confirm that. I have *never* used Frost, and probably never will,
and I've seen the DSB quite a few times, especially recently. So
it's not *unique* to Frost, though I can't say whether running Frost
might increase its likelihood.

Some people have said the DSB only shows up when Freenet is stopped.
Oskar (hobx) claims that's not the case; the DSB actually happens,
stealthily, while the node is running. You just don't know about it
until you try to restart the node and can't do so.

(Actually, that's not the only symptom. The other symptom, for a node
that's been DSB'd but not stopped yet, is that requests for information
go unsatisfied -- either it doesn't answer, or it gives back blank
pages. "Document contains no data" errors in the browser, for example.)

Some people say it happens when the data store fills up. Or when one
file of a multiple-file data store fills up. Well, hawk.freenetproject.org
is running a 10 MB data store (yes, 10 *mega* bytes, that's not a typo)
and apparently it doesn't get DSB'd, or at least not commonly.

Someone in IRC (I believe it was Pascal) said that the DSB is usually
caused by the Java VM running out of memory. As far as I can tell,
he's right. If you check your node's logs after noticing a DSB,
chances are you'll see an "OutOfMemory" error message.

I'm unclear on just why a Java VM would get this. It seems that they
(most of them) pre-allocate a large chunk of memory (64 MB), and then
dole this out to the application as it's needed. But what happens
when that initial chunk is all used up? Doesn't it allocate more?
Perhaps, perhaps not. More importantly, if it can't/won't allocate
more memory, doesn't the application get some sort of error condition
that it can check? In C, malloc() would return NULL. I have no
idea what the Java equivalent of that is. But it's clear to me that
when the JVM runs out of memory, Freenet is *not* handling it
gracefully.

The workaround that Pascal(?) has suggested is to raise the amount
of memory the Java VM is permitted to use. This seems to require
two steps:

1) On the java command line that starts Freenet (usually in a shell
script), add a parameter which tells the Java VM to request more
memory. E.g., on Kaffe, this is "-mx".

2) Within the operating system environment, make sure that the amount
of memory the Java VM requests can actually be fulfilled. This
may mean running "ulimit -d" to raise the maximum process data
segment size. Kaffe will happily pre-allocate 256 MB even if
the data segment size is limited to less than that; but if you
actually *use* that much memory, *blam*, DSB time.

What may be needed is a thorough code audit to add error checking to
the Freenet node, so that it doesn't puke when memory is exhausted.
Either that, or make it use less memory in the first place. I'm no
Java programmer, so I'll leave that one alone.

As an end user, one thing that *might* work to decrease the amount of
memory used by the node is to lower the number of threads it's allowed
to use. On the other hand, when I asked the developers how many open
files the node uses, as a function of the number of threads, I was
greeted by a looming silence. Read what you will into that.

The DSB might be more common with some Java VM implementations than
with others. Kaffe seems to be more aggressive in its memory use
than the Sun JRE, but then again, I haven't conducted a rigorous
investigation yet. It's *REALLY* hard to do this when Sun's
"write once run anywhere" solution doesn't *ACTUALLY* run anywhere.
It only runs on Windows, Solaris and Linux. That leaves a lot of
us out in the cold....

Finally, some people have suggested backing up the data store periodically,
so that it can be restored to its pre-DSB state. This is a reasonable
suggestion, except that there's no deterministic way to know in
advance whether a given store file is DSB'd or not. Backing up a
DSB'd store is a colossal waste of time and space, especially if you
overwrite a good backup with a bad one. If you do this, you'll need
at least *two* backups, rotated, so that when you try to restart the
node after the backup, if you find that it's corrupt, you can restore the
good one, not the bad one you just created.

That's all for now.

--
Greg Wooledge | "Truth belongs to everybody."
greg at wooledge.org | - The Red Hot Chili Peppers
http://wooledge.org/~greg/ |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20020904/abdfc64b/attachment.pgp>

[freenet-dev] data store bug

Reply via email to