Paul Bayliss created DIRSERVER-1894:
---------------------------------------

             Summary: Multi-Master replicated startup does not complete
                 Key: DIRSERVER-1894
                 URL: https://issues.apache.org/jira/browse/DIRSERVER-1894
             Project: Directory ApacheDS
          Issue Type: Bug
          Components: ldap
    Affects Versions: 2.0.0-M15
            Reporter: Paul Bayliss


On startup of a directory instance configured as a replication consumer, the 
instance is unable to bind to its local port until a connection can be made to 
the replication provider. In a 2 node multi-master setup this has a chicken and 
egg effect in that neither node is able to starts its LDAP port and the 
following errors are repeated in the logs indefinitely.

Instance 1:

[12:58:26] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect 
to the server localhost:11389, cause : Cannot connect on the server: Connection 
refused
[12:58:26] ERROR 
[org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] 
- Failed to connect to the server localhost:11389, cause : Cannot connect on 
the server: Connection refused

Instance 2:

[12:58:14] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect 
to the server localhost:10389, cause : Cannot connect on the server: Connection 
refused
[12:58:14] ERROR 
[org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] 
- Failed to connect to the server localhost:10389, cause : Cannot connect on 
the server: Connection refused

netstat shows that the LDAP ports are not bound.

> netstat -a | egrep "10389|11389"


It is possible to trick the instances into starting up by starting instance 1 
without being a replication consumer, then starting instance 2. I then stop 
instance 1 change it to be a consumer and restart it. Then both instances are 
running and netstat shows me the replication connections and the listening LDAP 
ports. Replication now works in both directions.

> netstat -a | egrep "10389|11389"
tcp4       0      0  localhost.10389        localhost.51051        ESTABLISHED
tcp4       0      0  localhost.51051        localhost.10389        ESTABLISHED
tcp46      0      0  *.10389                *.*                    LISTEN     
tcp4       0      0  localhost.11389        localhost.51050        ESTABLISHED
tcp4       0      0  localhost.51050        localhost.11389        ESTABLISHED
tcp46      0      0  *.11389                *.*                    LISTEN 

I will attach the configuration file of the two instances that can be used to 
reproduce this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to