Paul, I have tried to find any possible memory leaks in the flow below, but I can't find any reason to the memory leak.
==00:01:38:06.536 20943== 360,480 (278,560 direct, 81,920 indirect) bytes in 1,741 blocks are definitely lost in loss record 5 of 5 ==00:01:38:06.536 20943== at 0x4A05809: malloc (vg_replace_malloc.c:149) ==00:01:38:06.536 20943== by 0x3C52A52FBE: FS_OWQ_from_pn (ow_parseobject.c:87) ==00:01:38:06.536 20943== by 0x3C52A530AC: FS_OWQ_create_sibling (ow_parseobject.c:53) ==00:01:38:06.536 20943== by 0x3C52A58AFB: FS_r_sibling_F (ow_sibling.c:49) ==00:01:38:06.536 20943== by 0x3C52A27C7B: FS_slowtemp (ow_1820.c:282) ==00:01:38:06.536 20943== by 0x3C52A53FAC: FS_read_lump (ow_read.c:472) ==00:01:38:06.536 20943== by 0x3C52A542F4: FS_r_local (ow_read.c:427) ==00:01:38:06.536 20943== by 0x3C52A54967: FS_r_given_bus (ow_read.c:232) ==00:01:38:06.536 20943== by 0x3C52A54B01: FS_read_distribute (ow_read.c:193) ==00:01:38:06.536 20943== by 0x3C52A55223: FS_read_postparse (ow_read.c:109) ==00:01:38:06.536 20943== by 0x402AAD: ReadHandler (read.c:86) ==00:01:38:06.536 20943== by 0x40339A: DataHandler (data.c:125) ==00:01:38:06.508 20943== ERROR SUMMARY: 21 errors from 1 contexts (suppressed: 4 from 1) ==00:01:38:06.508 20943== ==00:01:38:06.508 20943== 21 errors in context 1 of 1: ==00:01:38:06.508 20943== Invalid read of size 8 ==00:01:38:06.508 20943== at 0x3C52A4D8B7: LockGet (ow_locks.c:193) ==00:01:38:06.508 20943== by 0x3C52A548A1: FS_r_given_bus (ow_read.c:231) ==00:01:38:06.508 20943== by 0x3C52A54B01: FS_read_distribute (ow_read.c:193) ==00:01:38:06.508 20943== by 0x3C52A55223: FS_read_postparse (ow_read.c:109) ==00:01:38:06.508 20943== by 0x402AAD: ReadHandler (read.c:86) ==00:01:38:06.508 20943== by 0x40339A: DataHandler (data.c:125) ==00:01:38:06.508 20943== by 0x30DA6062F6: start_thread (in /lib64/libpthread-2.5.so) ==00:01:38:06.509 20943== by 0x30D8ED1E3C: clone (in /lib64/libc-2.5.so) ==00:01:38:06.509 20943== Address 0x61A3430 is 0 bytes inside a block of size 32 free'd ==00:01:38:06.509 20943== at 0x4A0541E: free (vg_replace_malloc.c:233) ==00:01:38:06.509 20943== by 0x30D8ED02B6: tdelete (in /lib64/libc-2.5.so) ==00:01:38:06.509 20943== by 0x3C52A4D763: LockRelease (ow_locks.c:209) ==00:01:38:06.509 20943== by 0x3C52A5498B: FS_r_given_bus (ow_read.c:238) ==00:01:38:06.509 20943== by 0x3C52A54B01: FS_read_distribute (ow_read.c:193) ==00:01:38:06.509 20943== by 0x3C52A55223: FS_read_postparse (ow_read.c:109) ==00:01:38:06.509 20943== by 0x402AAD: ReadHandler (read.c:86) ==00:01:38:06.509 20943== by 0x40339A: DataHandler (data.c:125) ==00:01:38:06.509 20943== by 0x30DA6062F6: start_thread (in /lib64/libpthread-2.5.so) ==00:01:38:06.509 20943== by 0x30D8ED1E3C: clone (in /lib64/libc-2.5.so) The result seems to be that LockGet() and LockRelease() fails as well.... It might be generated by a un-successful temp-reading, and therefore the memory isn't allocated and free'd correctly at some points. I have to go home now, and I won't be able to look more into this until next year. /Christian -----Original Message----- From: Serg Oskin [mailto:s...@oskin.ru] Sent: Tuesday, December 23, 2008 4:10 PM To: owfs-developers@lists.sourceforge.net Subject: Re: [Owfs-developers] general protection Running: /usr/bin/valgrind --time-stamp=yes --leak-check=full --leak-resolution=high --log-file=/tmp/owserver.log --trace-children=yes --undef-value-errors=yes --verbose /usr/sbin/owserver -p 30003 -d /dev/ttyS0 -t 30 --foreground --fatal_debug --fatal_debug_file=/tmp/owserver_fatal Not after you see the "Invalid read of size 8", or after Ctrl-C file /tmp/owserver_fatal* is not created. Serg. >> I have tried to stress-test owserver as much as I can here on >> different platforms, and I can't reproduce the errors. > > In my case, the owserver used (by owread) 2xNagios every minute and 2xCacti (2 threads each) every 5 minutes. > As shown in the logs, this error does not occur immediately, but after a while, in the latter case through 11 hours. > > In order to avoid a hardware error, I tried to completely change the server - it did not help. > >> Can you start with the debug-output from all pthread-calls as well.. >> /usr/sbin/owserver -p 30003 -d /dev/ttyS0 -t 30 --foreground >> --fatal_debug --fatal_debug_file=/tmp/owserver_fatal >> >> If /tmp/owserver_fatal.pid is created and filled with debug-messages, >> then some pthread-calls are failing on your server. > > --fatal_debug I added to /etc/sysconfig/owserver, but forgot to add when run by valgrind. Now add ... > >> BTW: I will be on vacation between December 24th to 31'th, so I will >> not be able to do much more on this after tonight. > > When owserver runs through valgrind, the system performs its > functions... :) > > Merry Christmas! :) > > Serg. > >> /Christian >> >> >> -----Original Message----- >> From: Serg Oskin [mailto:s...@oskin.ru] >> Sent: Tuesday, December 23, 2008 12:11 PM >> To: owfs-developers@lists.sourceforge.net >> Subject: Re: [Owfs-developers] general protection >> >> If I run owserver as daemon, he was in such cases crashed. >> If I run well >> valgrind ... /usr/sbin/owserver -p 30003 -d /dev/ttyS0 -t 30 >> --foreground he continues to work. According to my experience this is >> also a sign of memory leaks. >> >> This is beginning to occur when owserver started simultaneously use >> several >> (4-6) clients, rather than 1-2 as before. >> >> Serg. >> >>> I'm pretty clue-less why you get this error-message when >>> "memcpy(&(pn->lock), &(opaque->key), sizeof(struct devlock *));" is >> called. >>> ==00:01:31:14.563 2670== Invalid read of size 8 >>> ==00:01:31:14.563 2670== at 0x4C568B7: LockGet (ow_locks.c:193) >>> ==00:01:31:14.563 2670== by 0x4C5D8A1: FS_r_given_bus (ow_read.c:231) >>> ==00:01:31:14.563 2670== by 0x4C5DB01: FS_read_distribute >> (ow_read.c:193) >>> But... your owserver didn't seem to crash now? Is the major problem >>> fixed now? >>> I have checked in some various fixed some minutes ago, but nothing >>> special that should affect this issue. >>> >>> /Christian >>> >>> >>> -----Original Message----- >>> From: Serg Oskin [mailto:s...@oskin.ru] >>> Sent: Monday, December 22, 2008 7:50 PM >>> To: owfs-developers@lists.sourceforge.net >>> Subject: Re: [Owfs-developers] general protection >>> >>> I later corrected this. :) >>> But this did not help. :( >>> >>> Serg. >>> >>>> Sorry... I made a typo in the code... the memcpy row should look >>>> like >>> this: >>>> memcpy(&(pn->lock), &(opaque->key), sizeof(struct devlock *)); >>>> >>>> Forgot to get the pointer's to the variables, and therefore it >>>> ended up with a segmentation fault instead... >>>> Can you try to change the row and recompile with memcpy again? >>>> >>>> /Christian >>>> >>>> >>>> -----Original Message----- >>>> From: Serg Oskin [mailto:s...@oskin.ru] >>>> Sent: Monday, December 22, 2008 12:13 PM >>>> To: owfs-developers@lists.sourceforge.net >>>> Subject: Re: [Owfs-developers] general protection >>>> >>>>> Now running CVS-version on 9:30 UTC 2008-12-22 ... >>>> Results of file attachments. >>>> >>>> Serg. >>>> >>>>> Serg. >>>>> >>>>>> Hi Serg, >>>>>> >>>>>> Interesting log-files... It seems that your compiler generate >>>>>> wrong >>>> code... >>>>>> ==00:06:33:57.651 2275== Invalid read of size 8 >>>>>> ==00:06:33:57.651 2275== at 0x4C56559: LockGet (ow_locks.c:195) >>>>>> >>>>>> ==00:06:33:57.651 2275== Address 0x5A0D750 is 0 bytes inside a >>>>>> block of size 32 free'd >>>>>> ==00:06:33:57.651 2275== at 0x4A0541E: free >> (vg_replace_malloc.c:233) >>>>>> ==00:06:33:57.651 2275== by 0x30D8ED02B6: tdelete (in >>>> /lib64/libc-2.5.so) >>>>>> tsearch() seem to return a pointer to opaque, but "pn->lock = >>>> opaque->key" >>>>>> results "Invalid read of size 8"... ? >>>>>> I have made some changes in the code, and that might fix the problem. >>>>>> Size of "struct devlock" might have be unknown at some places and >>>>>> I have moved around the definitions a bit. >>>>>> >>>>>> Can you checkout the latest CVS-version and try it? >>>>>> >>>>>> >>>>>> >>>>>> If this doesn't work, you can try to edit >>>>>> module/owlib/src/c/ow_locks.c and change two rows... >>>>>> pn->lock = (struct devlock *)opaque->key; /* Serg: >>>>>> Invalid read of size 8 */ >>>>>> /* Why should a pointer compare fail? Unaligned memory? >>>>>> Perhaps try to copy the pointer with memcpy() instead. >>>>>> Will this help? >>>>>> */ >>>>>> //memcpy(pn->lock, opaque->key, sizeof(struct devlock >>>>>> *)); >>>>>> >>>>>> Comment out the row "pn->lock = " and uncomment memcpy instead.... >>>>>> Will this work for better for you? >>>>>> It should remove the warning of "Invalid read of size 8" at >>>>>> least, and perhaps everything will work then as well. >>>>>> >>>>>> BTW: Which platform are you using? I have a feeling that your >>>>>> source isn't compiled with -m64, even if it should be... >>>>>> Could you look at the host_cpu in config.log and "uname -a" >>>>>> # grep host_cpu cvs/owfs/config.log # uname -a >>>>>> >>>>>> /Christian >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Serg Oskin [mailto:s...@oskin.ru] >>>>>> Sent: Saturday, December 20, 2008 10:33 PM >>>>>> To: owfs-developers@lists.sourceforge.net >>>>>> Subject: Re: [Owfs-developers] general protection >>>>>> >>>>>> Ctrl-C pressed. >>>>>> >>>>>> >>>>>>> Tried - the result of old. >>>>>>> Version: from CVS at Dec 20 2008 12:00 UTC. >>>>>>> >>>>>>> Message in /tmp/owfs_fatal I received only once during the "kill >>>>>>> owserver_pid": >>>>>>> ow_connect.c:322 mutex_destroy failed rc=16 [Device or resource >>>>>>> busy] >>>>>>> >>>>>>> Serg. >>>>>> >>>>>> >>>>>> __________ Information from ESET NOD32 Antivirus, version of >>>>>> virus signature database 3709 (20081220) __________ >>>>>> >>>>>> The message was checked by ESET NOD32 Antivirus. >>>>>> >>>>>> http://www.eset.com >>>>>> >>>>>> >>>>>> >>>>>> __________ Information from ESET NOD32 Antivirus, version of >>>>>> virus signature database 3709 (20081220) __________ >>>>>> >>>>>> The message was checked by ESET NOD32 Antivirus. >>>>>> >>>>>> http://www.eset.com >>>>>> >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> --- >>>>>> - >>>>>> --------- _______________________________________________ >>>>>> Owfs-developers mailing list >>>>>> Owfs-developers@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/owfs-developers >>>>>> >>>>>> >>>>> ------------------------------------------------------------------ >>>>> --- >>>>> - >>>>> -------- _______________________________________________ >>>>> Owfs-developers mailing list >>>>> Owfs-developers@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/owfs-developers >>>>> >>>>> >>>> >>>> >>>> __________ Information from ESET NOD32 Antivirus, version of virus >>>> signature database 3710 (20081222) __________ >>>> >>>> The message was checked by ESET NOD32 Antivirus. >>>> >>>> http://www.eset.com >>>> >>>> >>>> >>>> __________ Information from ESET NOD32 Antivirus, version of virus >>>> signature database 3710 (20081222) __________ >>>> >>>> The message was checked by ESET NOD32 Antivirus. >>>> >>>> http://www.eset.com >>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> --- >>>> -------- _______________________________________________ >>>> Owfs-developers mailing list >>>> Owfs-developers@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/owfs-developers >>>> >>>> >>> >>> >>> __________ Information from ESET NOD32 Antivirus, version of virus >> signature >>> database 3710 (20081222) __________ >>> >>> The message was checked by ESET NOD32 Antivirus. >>> >>> http://www.eset.com >>> >>> >>> >>> >> --------------------------------------------------------------------- >> ------- >> -- >>> _______________________________________________ >>> Owfs-developers mailing list >>> Owfs-developers@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/owfs-developers >>> >>> >> --------------------------------------------------------------------- >> ------- >> -- >> _______________________________________________ >> Owfs-developers mailing list >> Owfs-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/owfs-developers >> >> >> --------------------------------------------------------------------- >> --------- _______________________________________________ >> Owfs-developers mailing list >> Owfs-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/owfs-developers >> >> > > ---------------------------------------------------------------------- > -------- _______________________________________________ > Owfs-developers mailing list > Owfs-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/owfs-developers > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 3713 (20081223) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ------------------------------------------------------------------------------ _______________________________________________ Owfs-developers mailing list Owfs-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/owfs-developers