Hello all,
some time ago, we switched to Linux and installed the Linux AX.25
subsystem, the AX.25 utilities and the FBB BBS.
It seemed to run smoothly after the usual configuration trouble. But
now, we encountered some problems that seemed to occur more
and more often.
The first is one already mentioned in our mailinglists. It is the
appearance of error messages like
socket error: write on socket: broken pipe
when using FBB with the kernel based AX.25. This seems to be
associated with another mechanism that sometimes causes
trouble. The BBS writes its data to the kernel AX.25 socket and
fills its buffers. After all data is written, FBB seems to start the
timeout countdown for the certain user. But, on bad or slow links
like 1200bps, it may take some time until all data could be really
sent out to the user.
This way, it happens that FBB times out without the machine really
having sent all the data to the user!
The effect can be seen as a sudden disconnect (DISC+) without
any visible reason for the user. It almost never occurs when the
downlink is of good quality, i.e. all packets are being sent out
without errors or retries.
But if there are bad conditions, and the BBS machine has to
resend some frames, it is very likely that the user is being
disconnected suddenly without having the chance to finish his
download.
Please note that this theory is only an attempt to explain that weird
behavour. Using the call programm included in the AX.25 utilities
we never experienced such problems no matter how bad the link
was. This leads to the conclusion that the "handshaking" between
FBB and the AX.25 subsystem may be not optimal.
By the way: The same effect can be experienced when using the
FBB BBS together with the G8BPQ Packet Switch under DOS.
Sometimes users get timeouts too if they are downloading a lot of
data over a bad link there.
Yesterday, things were getting even worse. Sometimes users were
disconnected very short after logging in. Then, suddenly, at one
port no more connects were possible. The BBS actually did not
answer.
Trying another port, the user got the usual message prompting for
the login. Immediately after displaying this message a disconnect
followed, however. There was no chance to login.
To test the other services on the machine we tried to connect to
our node (AWZnode). The node came up as usual but showed no
reaction to any input.
What we found out were some strange things on the Linux
machine. The first thing was that the system's clock did not have
the correct time and date. Obviously, there was something wrong
with the clock. The machine today had a system date of April 8th.
Could it be that running Linux over a couple of weeks without
restarting has influences on the system's clock?
The second thing was more interesting: The mheard data file
/var/ax25/mheard/mheard.bin had grown up to 56 MBytes!
After deleting this file AWZnode resumed working.
Question here: How can this be avoided? Is there a way to
configure mheardd to set a maximum size for its data file?
At the moment, we have set up a crontab entry so that the file now
is deleted once in a month. Is this really the way to go or are there
more options?
In the hope that someone has got the one or the other hint for
solving these problems,
Best regards, 73
Gerd