On Apr 14, 2007, at 5:03 PM, Arno Lehmann wrote:

> I've got no idea why the SD would need that much memory... usually I
> don't notice a remarkable memory consumption by the SD.
>
> Can you reproduce the problem?


Yes, it continues to happen--I'm just not sure how to check what code  
is causing the memory consumption, unless it happens to be the  
section which throws the final error that causes the traceback.

Additional factoids:

* Possible symptom is a client-runs-after-job script which executes  
and completes without error but the job continues as if the script  
did not finish until Max Run Time is exceeded. (SD-bound issue, same  
FD on different SD with same scripts has no problem.)

* Yet the basic problem still happens even if the SD has only non- 
scripting jobs. (Or perhaps the script thing is a separate concern.)

* One job may end with the error "Storage daemon didn't accept Device  
"FileStorage" command.", but it doesn't mean the SD crashed at that  
point. In fact, the log shows that around a day after that specific  
job ended, the SD is still trying to reserve a device for the job.

* Additionally to the previous job, the "accept device" error also  
appears for jobs which were still waiting ("intervention needed..." ) 
when the SD crashed with "out of memory". (The jobs are well past  
their Max-Start-Delay time by that time.)

* The last successful job using the SD before the first "didn't  
accept device" error does not exhibit this "forever reservation" effect.


Traceback/Log text

> storageserver: bget_msg.c:71 Got BNET_EOD
> 15-Apr 08:42 storageserver: ABORTING due to ERROR in smartall.c:144
> Out of memory
> storageserver: bget_msg.c:71 Got BNET_EOD
> Kaboom! bacula-sd, storageserver got signal 11. Attempting traceback.
> Kaboom! exepath=/etc/bacula
> storageserver: signal.c:138 Working=/var/bacula
> storageserver: signal.c:139 btpath=/etc/bacula/btraceback
> storageserver: signal.c:140 exepath=/etc/bacula/bacula-sd
> storageserver: bget_msg.c:71 Got BNET_EOD
> storageserver: signal.c:165 Doing waitpid
> storageserver: bget_msg.c:71 Got BNET_EOD
> Calling: /etc/bacula/btraceback /etc/bacula/bacula-sd 5474
> storageserver: bget_msg.c:71 Got BNET_EOD
[21 repeats of previous message]
> Traceback complete, attempting cleanup ...
> storageserver: signal.c:168 Done waitpid
> storageserver: jcr.c:167 write_last_jobs seek to 188
> storageserver: bget_msg.c:71 Got BNET_EOD
> storageserver: bget_msg.c:71 Got BNET_EOD
> storageserver: stored.c:577 In terminate_stored() sig=11
> storageserver: stored.c:580 Term device /backup/bacula/
> storageserver: reserve.c:232 free_volume storageserver.F.0038
> storageserver: dev.c:1849 close_dev "FileStorage" (/backup/bacula/)
> storageserver: reserve.c:213 free_volume: no vol on dev  
> "FileStorage" (/backup/bacula/)
> storageserver: reserve.c:223 free_volume storageserver.D.0001  
> dev="FileStorage" (/backup/bacula/)
> storageserver: bget_msg.c:71 Got BNET_EOD
> storageserver: bget_msg.c:71 Got BNET_EOD


--
--Darien A. Hager
[EMAIL PROTECTED]



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to