On Apr 14, 2007, at 5:03 PM, Arno Lehmann wrote: > I've got no idea why the SD would need that much memory... usually I > don't notice a remarkable memory consumption by the SD. > > Can you reproduce the problem?
Yes, it continues to happen--I'm just not sure how to check what code is causing the memory consumption, unless it happens to be the section which throws the final error that causes the traceback. Additional factoids: * Possible symptom is a client-runs-after-job script which executes and completes without error but the job continues as if the script did not finish until Max Run Time is exceeded. (SD-bound issue, same FD on different SD with same scripts has no problem.) * Yet the basic problem still happens even if the SD has only non- scripting jobs. (Or perhaps the script thing is a separate concern.) * One job may end with the error "Storage daemon didn't accept Device "FileStorage" command.", but it doesn't mean the SD crashed at that point. In fact, the log shows that around a day after that specific job ended, the SD is still trying to reserve a device for the job. * Additionally to the previous job, the "accept device" error also appears for jobs which were still waiting ("intervention needed..." ) when the SD crashed with "out of memory". (The jobs are well past their Max-Start-Delay time by that time.) * The last successful job using the SD before the first "didn't accept device" error does not exhibit this "forever reservation" effect. Traceback/Log text > storageserver: bget_msg.c:71 Got BNET_EOD > 15-Apr 08:42 storageserver: ABORTING due to ERROR in smartall.c:144 > Out of memory > storageserver: bget_msg.c:71 Got BNET_EOD > Kaboom! bacula-sd, storageserver got signal 11. Attempting traceback. > Kaboom! exepath=/etc/bacula > storageserver: signal.c:138 Working=/var/bacula > storageserver: signal.c:139 btpath=/etc/bacula/btraceback > storageserver: signal.c:140 exepath=/etc/bacula/bacula-sd > storageserver: bget_msg.c:71 Got BNET_EOD > storageserver: signal.c:165 Doing waitpid > storageserver: bget_msg.c:71 Got BNET_EOD > Calling: /etc/bacula/btraceback /etc/bacula/bacula-sd 5474 > storageserver: bget_msg.c:71 Got BNET_EOD [21 repeats of previous message] > Traceback complete, attempting cleanup ... > storageserver: signal.c:168 Done waitpid > storageserver: jcr.c:167 write_last_jobs seek to 188 > storageserver: bget_msg.c:71 Got BNET_EOD > storageserver: bget_msg.c:71 Got BNET_EOD > storageserver: stored.c:577 In terminate_stored() sig=11 > storageserver: stored.c:580 Term device /backup/bacula/ > storageserver: reserve.c:232 free_volume storageserver.F.0038 > storageserver: dev.c:1849 close_dev "FileStorage" (/backup/bacula/) > storageserver: reserve.c:213 free_volume: no vol on dev > "FileStorage" (/backup/bacula/) > storageserver: reserve.c:223 free_volume storageserver.D.0001 > dev="FileStorage" (/backup/bacula/) > storageserver: bget_msg.c:71 Got BNET_EOD > storageserver: bget_msg.c:71 Got BNET_EOD -- --Darien A. Hager [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users