Trying to get caught up here, sorry for the delayed reply...

You may well be right. My comment is based on direct experience with as few as 2 clients going after the same sensors. Let me describe my little network to you. I have 10 devices total spread across 3 hub ports. There are also the DS2401 used for wind direction in my original Dallas weather station, but they are not on the net all the time. There is usually a mixture of cached and uncached access, with uncached being used to retrieve the current counter from the anemometer and the current sensor list to retrieve the wind direction.

In 'normal' operation it seems to run pretty good. However, in a more strenuous test, I set my script to request all uncached reads. As soon as I start up a second process to read the sensors, I begin to get errors. That is to say, the calls to owserver timeout (payload_len -1). The last time I tried it a few minutes ago, owserver bit the dust after leaving only the parent process and a zombie. The clients were still connected to the owserver sockets, and blocking on receives. owserver had to be restarted. So...

I spent some time looking at this. owserver stayed up a while longer this time. It spawned 404(!) child processes, and was still going. These child processes were not terminating, but were stuck in a loop. Running strace shows some interesting behavior:

3413  --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
3413  getpid()                          = 3413
3413  rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
3413  gettimeofday({1168535069, 927295}, NULL) = 0
3413  rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
3413 write(5, "\234\277\374\0\0\0\0\0\0\0\0\0\0\0\226\34\234\277\373 \0"..., 148) = 148
3413  rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
3413  rt_sigsuspend([] <unfinished ...>
3413  --- SIGRTMIN (Real-time signal 0) @ 0 (0) ---
3413  <... rt_sigsuspend resumed> )     = 32
3413  sigreturn()                       = ? (mask now [RTMIN])
.
.
bunch of nanosleeps...
.
.
3413  gettimeofday({1168535071, 478213}, NULL) = 0
3413 writev(156, [{"\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0"..., 24}], 1) = 24
3413  gettimeofday({1168535071, 480470}, NULL) = 0
3413  nanosleep({0, 100000000},  <unfinished ...>
.
couple of successful writes like above, then the problem:
.
3413  <... gettimeofday resumed> {1168535074, 528564}, NULL) = 0
3413 writev(156, [{"\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0"..., 24}], 1) = -1 EPIPE (Broken pipe)
3413  --- SIGPIPE (Broken pipe) @ 0 (0) ---
3413  gettimeofday({1168535074, 539566}, NULL) = 0
.
more nanosleeps..
.
3413 writev(156, [{"\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0"..., 24}], 1) = -1 EPIPE (Broken pipe)
3413  --- SIGPIPE (Broken pipe) @ 0 (0) ---
3413  gettimeofday({1168535097, 59694}, NULL) = 0
3413  nanosleep({0, 100000000},  <unfinished ...>

an so on, ad infinitum. I can't tell what fd 156 is or was and the 4 255s don't mean anything to me. lsof shows "can't identify protocol'. I'm guessing it was a client connection which was closed by the client when owserver still had something it wanted to write, but i'm not sure. In any case, it looks like there is no SIGPPIPE handler and the error isn't caught elsewhere, so it doesn't gracefully die.

It was the timeouts that made me make my original comment. But these hanging child processes and large number of them concern me. I'm surprised my little NSLU2 didn't raise it's little ARM and surrender! Let me know if there's some other info you'd like me to collect. I don't have a build environment setup for the NSLU2, so if someone else doesn't build the packages, I can't readily retest.

Thanks!

Paul


On Jan 10, 2007, at 11:50 PM, Paul Alfille wrote:

On 1/9/07, ziggy <[EMAIL PROTECTED]> wrote:

2. I don't believe this is a significant issue. While there may be no hard limits on the number of concurrent connections now, the practical limit is 1. The 1-wire is single access and can not be shared. Trying to use multiple connections simultaneously winds up making them all slow, with frequent timeouts.

This may be a reflection of your style of use. I can envision high frequency control processes, and lof frequency logging and display/ monitoring processes all attacking the same 1-wire bus. Particulary if some of the processes can use cached results.

Paul
---------------------------------------------------------------------- ---
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php? page=join.php&p=sourceforge&CID=DEVDEV________________________________ _______________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers

Reply via email to