Hello,

16.07.2007 12:40,, Kern Sibbald wrote::
> Hello Arno,
> 
> On Monday 16 July 2007 12:29, Arno Lehmann wrote:
>> Hi,
>>
>> 16.07.2007 11:21,, Alfredo Marchini wrote::
>>> Hi,
>>> bacula-dir is blocked again:
>>> I've just made this tests:
>>>
>>> - time command from bconsole: works
>> Good, so the DIR is basically up and running.
...
>> The DIR trace file:
>>
>>> 14-Jul 09:18 oracolo-director: Fatal Error at bnet_server.c:172 because:
>>> Error in select: Unknown error 514
> 
> What is always important is the first error, and in this case, it is an error 
> in select(), which means either the Bacula memory has been seriously damaged, 
> or that there is a problem with your OS (i.e. kernel, CPU, memory, ...).

Right, which is the reason for my later remarks :-)

> 
>>> 14-Jul 09:18 oracolo-director: ABORTING due to ERROR in smartall.c:193
>>> qp->qnext->qprev != qp called from dlist.c:341
>>> 14-Jul 09:18 oracolo-director: Fatal Error because: Bacula interrupted by 
> signal 11: Segmentation violation
>>> 14-Jul 09:18 oracolo-director: Fatal Error at bnet_server.c:172 because:
>>> Error in select: Unknown error 514
>>> 14-Jul 09:18 oracolo-director: ABORTING due to ERROR in smartall.c:193
>>> qp->qnext->qprev != qp called from dlist.c:341
>>> 14-Jul 09:18 oracolo-director: Fatal Error because: Bacula interrupted by 
> signal 11: Segmentation violation
>> This looks bad. I suppose it's worth a bug report on bugs.bacula.org, 
>> and/or an email to the developers list. (I cc it there...)
>>
>> Error 514 in select does not sound like a problem in the DIR code, but 
>> the error handling could perhaps catch this sort or problem.
>>
>> A very quick search for error code 514 revealed this:
>>
>>> /* Should never be seen by user programs */
>>> #define ERESTARTSYS     512
>>> #define ERESTARTNOINTR  513
>>> #define ERESTARTNOHAND  514     /* restart if no handler.. */
>>> #define ENOIOCTLCMD     515     /* No ioctl command */
>>> #define ERESTART_RESTARTBLOCK 516 /* restart by calling 
> sys_restart_syscall */
>> which might indicate a kernel problem (if you encounter this on linux 
>> 2.6...)
> 
> Yes, either a kernel problem or a hardware problem seem the most likely.  We 
> cannot exclude a Bacula bug, but the finger is pointing to the CPU/hardware.

Well, this is problematic... Alfredo gave good reasons to assume that 
it's not purely hardware/OS related. Basically, the problem occurs 
when he runs certain jobs.

I guess that the interworking of DIR, SD, catalog database, and OS 
might trigger some sort of resource exhaustion, but debugging this is 
beyond my abilities :-)

> I recommend shutting down your machine, rebooting it, running memtest, and if 
> all is OK, restarting Bacula and see what happens.

Fortunately, that's not my machine :-)

Unfortunately, my backup server is dying, but I know and understand 
that problem :-(

Arno

> Regards,
> 
> Kern
> 
>> Arno
>>
>> -- 
>> Arno Lehmann
>> IT-Service Lehmann
>> www.its-lehmann.de
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> Bacula-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to