Trying to get caught up here, sorry for the delayed reply...
You may well be right. My comment is based on direct experience with
as few as 2 clients going after the same sensors. Let me describe my
little network to you. I have 10 devices total spread across 3 hub
ports. There are also the DS2401 used for wind direction in my
original Dallas weather station, but they are not on the net all the
time. There is usually a mixture of cached and uncached access, with
uncached being used to retrieve the current counter from the
anemometer and the current sensor list to retrieve the wind direction.
In 'normal' operation it seems to run pretty good. However, in a more
strenuous test, I set my script to request all uncached reads. As
soon as I start up a second process to read the sensors, I begin to
get errors. That is to say, the calls to owserver timeout
(payload_len -1). The last time I tried it a few minutes ago,
owserver bit the dust after leaving only the parent process and a
zombie. The clients were still connected to the owserver sockets, and
blocking on receives. owserver had to be restarted. So...
I spent some time looking at this. owserver stayed up a while longer
this time. It spawned 404(!) child processes, and was still going.
These child processes were not terminating, but were stuck in a loop.
Running strace shows some interesting behavior:
3413 --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
3413 getpid() = 3413
3413 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
3413 gettimeofday({1168535069, 927295}, NULL) = 0
3413 rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
3413 write(5, "\234\277\374\0\0\0\0\0\0\0\0\0\0\0\226\34\234\277\373
\0"..., 148) = 148
3413 rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
3413 rt_sigsuspend([] <unfinished ...>
3413 --- SIGRTMIN (Real-time signal 0) @ 0 (0) ---
3413 <... rt_sigsuspend resumed> ) = 32
3413 sigreturn() = ? (mask now [RTMIN])
.
.
bunch of nanosleeps...
.
.
3413 gettimeofday({1168535071, 478213}, NULL) = 0
3413 writev(156, [{"\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0"..., 24}], 1) = 24
3413 gettimeofday({1168535071, 480470}, NULL) = 0
3413 nanosleep({0, 100000000}, <unfinished ...>
.
couple of successful writes like above, then the problem:
.
3413 <... gettimeofday resumed> {1168535074, 528564}, NULL) = 0
3413 writev(156, [{"\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0"..., 24}], 1) = -1 EPIPE (Broken pipe)
3413 --- SIGPIPE (Broken pipe) @ 0 (0) ---
3413 gettimeofday({1168535074, 539566}, NULL) = 0
.
more nanosleeps..
.
3413 writev(156, [{"\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0"..., 24}], 1) = -1 EPIPE (Broken pipe)
3413 --- SIGPIPE (Broken pipe) @ 0 (0) ---
3413 gettimeofday({1168535097, 59694}, NULL) = 0
3413 nanosleep({0, 100000000}, <unfinished ...>
an so on, ad infinitum. I can't tell what fd 156 is or was and the 4
255s don't mean anything to me. lsof shows "can't identify protocol'.
I'm guessing it was a client connection which was closed by the
client when owserver still had something it wanted to write, but i'm
not sure. In any case, it looks like there is no SIGPPIPE handler and
the error isn't caught elsewhere, so it doesn't gracefully die.
It was the timeouts that made me make my original comment. But these
hanging child processes and large number of them concern me. I'm
surprised my little NSLU2 didn't raise it's little ARM and surrender!
Let me know if there's some other info you'd like me to collect. I
don't have a build environment setup for the NSLU2, so if someone
else doesn't build the packages, I can't readily retest.
Thanks!
Paul
On Jan 10, 2007, at 11:50 PM, Paul Alfille wrote:
On 1/9/07, ziggy <[EMAIL PROTECTED]> wrote:
2. I don't believe this is a significant issue. While there may be
no hard limits on the number of concurrent connections now, the
practical limit is 1. The 1-wire is single access and can not be
shared. Trying to use multiple connections simultaneously winds up
making them all slow, with frequent timeouts.
This may be a reflection of your style of use. I can envision high
frequency control processes, and lof frequency logging and display/
monitoring processes all attacking the same 1-wire bus. Particulary
if some of the processes can use cached results.
Paul
----------------------------------------------------------------------
---
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to
share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?
page=join.php&p=sourceforge&CID=DEVDEV________________________________
_______________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers