On 04 May 2014, at 10:18 PM, Graham Leggett <[email protected]> wrote:
> We got "ber_get_next failed for connection 2" without being told what exactly
> failed. Will dig on the other side to see if there are any clues.
On the other side we see this:
[04/May/2014:21:29:10 +0100] NSMMReplicationPlugin - agmt_add: begin
[04/May/2014:21:29:10 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Trying secure slapi_ldap_init_ext
[04/May/2014:21:29:10 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): binddn = cn=Replication Manager,c
n=config, passwd = {DES}xxx
[04/May/2014:21:29:11 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): No linger to cancel on the connec
tion
[04/May/2014:21:29:12 +0100] - _csngen_adjust_local_time: gen state before
53670e7b0001:1399235244:0:27599
[04/May/2014:21:29:12 +0100] - _csngen_adjust_local_time: gen state after
53670ee70000:1399235352:0:27599
[04/May/2014:21:29:12 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Replica was successfully acquired
.
[04/May/2014:21:29:12 +0100] NSMMReplicationPlugin - Beginning total update of
replica "agmt="cn=Agreement serverc.example.com" (serverc:636)
".
[04/May/2014:21:29:12 +0100] - conn 2 leaving turbo mode due to 4
[04/May/2014:21:29:12 +0100] - listener got signaled
[04/May/2014:21:29:19 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Failed to send extended operation: LDAP
error -1 (Can't contact LDAP server)
The above message is printed by the following code:
slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name,
"%s: Failed to send %s operation: LDAP
error %d (%s)\n",
agmt_get_long_name(conn->agmt),
op_string ? op_string : "NULL", rc,
ldap_err2string(rc));
The above error is bogus - there is no such error as "error -1" in LDAP, it
turns out that at the top of perform_operation() the rc variable is set as
follows and appears to drop through to the error message without anything
happening:
int rc = -1;
Looking at how rc could drop through this code untouched we see this below,
there is no default section in the switch, and so it seems optype has a value
other than those handled:
switch (optype)
{
case CONN_ADD:
conn->status = STATUS_PROCESSING_ADD;
op_string = "add";
rc = ldap_add_ext(conn->ld, dn, attrs, server_controls,
NULL /* clientctls */, &msgid);
break;
case CONN_MODIFY:
conn->status = STATUS_PROCESSING_MODIFY;
op_string = "modify";
rc = ldap_modify_ext(conn->ld, dn, attrs,
server_controls,
NULL /* clientctls */, &msgid);
break;
case CONN_DELETE:
conn->status = STATUS_PROCESSING_DELETE;
op_string = "delete";
rc = ldap_delete_ext(conn->ld, dn, server_controls,
NULL /* clientctls */, &msgid);
break;
case CONN_RENAME:
conn->status = STATUS_PROCESSING_RENAME;
op_string = "rename";
rc = ldap_rename(conn->ld, dn, newrdn, newparent,
deleteoldrdn,
server_controls, NULL /* clientctls */, &msgid);
break;
case CONN_EXTENDED_OPERATION:
conn->status = STATUS_PROCESSING_EXTENDED_OPERATION;
op_string = "extended";
rc = ldap_extended_operation(conn->ld, extop_oid,
extop_payload,
server_controls, NULL /* clientctls */, &msgid);
}
Putting in a breakpoint at perform_operation() shows that optype each time is
always CONN_EXTENDED_OPERATION, so it is looking like ldap_extended_operation()
is returning -1. The code is optimised, so I can't be sure though.
The log continues:
[04/May/2014:21:29:19 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Disconnected from the consumer
[04/May/2014:21:29:19 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Connection disconnected by another thread
[04/May/2014:21:29:19 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Received error -1 (Can't contact LDAP
server): for total update operation
[04/May/2014:21:29:19 +0100] - repl5_tot_waitfor_async_results: 258 -1
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Warning: unable to send endReplication
extended operation (Can't contact LDAP server)
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): repl5_tot_run: failed to obtain data to
send to the consumer; LDAP error - -2
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): No linger to cancel on the connection
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Disconnected from the consumer
It appears having failed above we pretend we didn't and try and connect again.
This time we succeed, but because we've never initialised successfully, we
throw our error "Replica has a different generation ID than the local data.":
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): State: start -> ready_to_acquire_replica
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Trying secure slapi_ldap_init_ext
[04/May/2014:21:29:20 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): binddn = cn=Replication Manager,cn=config,
passwd = {DES}xxx
[04/May/2014:21:29:21 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Replication bind with SIMPLE auth resumed
[04/May/2014:21:29:21 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): No linger to cancel on the connection
[04/May/2014:21:29:21 +0100] - _csngen_adjust_local_time: gen state before
53670ee70001:1399235352:0:27599
[04/May/2014:21:29:21 +0100] - _csngen_adjust_local_time: gen state after
53670ef00000:1399235361:0:27599
[04/May/2014:21:29:21 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Replica was successfully acquired.
[04/May/2014:21:29:21 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): State: ready_to_acquire_replica ->
sending_updates
[04/May/2014:21:29:21 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Replica has a different generation ID than
the local data.
[04/May/2014:21:29:22 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Successfully released consumer
[04/May/2014:21:29:22 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): Beginning linger on the connection
[04/May/2014:21:29:22 +0100] NSMMReplicationPlugin - agmt="cn=Agreement
serverc.example.com" (serverc:636): State: sending_updates -> start_backoff
Regards,
Graham
--
--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users