On Mon, Oct 11, 2010 at 4:16 PM, Avinash Lakshman <
avinash.laksh...@gmail.com> wrote:

> tickTime = 2000, initLimit = 3000 and the data is around 11GB this is log +
> snapshot. So if I need to add a new observer can I transfer state from the
> ensemble manually before starting it? If so which files do I need to
> transfer?
>
>
You can't really do it manually. As part of the "bring up" process for a
server it communicates with the current leader and downloads the appropriate
data (either a diff of the recent changes or a full snapshot if too far
behind ). Try increasing your initLimit to 15 or so (btw, that' in ticks,
not milliseconds, so if you have 3000 now that's probably not the issue ;-)
). You might also want to increase the syncLimit at the same time. Here's
from the sample conf that ships with the release:

# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5

Patrick



> Thanks
>
> On Mon, Oct 11, 2010 at 10:16 AM, Benjamin Reed <br...@yahoo-inc.com>
> wrote:
>
> >  how big is your data? you may be running into the problem where it takes
> > too long to do the state transfer and times out. check the initLimit and
> the
> > size of your data.
> >
> > ben
> >
> >
> > On 10/10/2010 08:57 AM, Avinash Lakshman wrote:
> >
> >> Thanks Ben. I am not mixing processes of different clusters. I just
> double
> >> checked that. I have ZK deployed in a 5 node cluster and I have 20
> >> observers. I just started the 5 node cluster w/o starting the observers.
> I
> >> still the same issue. Now my cluster won't start up. So what is the
> >> correct
> >> workaround to get this going? How can I find out who the leader is and
> who
> >> the follower to get more insight?
> >>
> >> Thanks
> >> A
> >>
> >> On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reed<br...@yahoo-inc.com>
> >>  wrote:
> >>
> >>  this usually happens when a follower closes its connection to the
> leader.
> >>> it is usually caused by the follower shutting down or failing. you may
> >>> get
> >>> further insight by looking at the follower logs. you should really run
> >>> with
> >>> timestamps on so that you can correlate the logs of the leader and
> >>> follower.
> >>>
> >>> on thing that is strange is the wide divergence between zxid of
> follower
> >>> and leader. are you mixing processes of different clusters?
> >>>
> >>> ben
> >>>
> >>> ________________________________________
> >>> From: Avinash Lakshman [avinash.laksh...@gmail.com]
> >>> Sent: Sunday, October 10, 2010 8:18 AM
> >>> To: zookeeper-user
> >>> Subject: What does this mean?
> >>>
> >>> I see this exception and the servers not doing anything.
> >>>
> >>> java.io.IOException: Channel eof
> >>>        at
> >>>
> >>>
> >>>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
> >>> ERROR - 124554051584(higestZxid)>  21477836646(next log) for type -11
> >>> WARN - Sending snapshot last zxid of peer is 0xe00000000  zxid of
> leader
> >>> is
> >>> 0x1e00000000
> >>> WARN - Sending snapshot last zxid of peer is 0x1800000000  zxid of
> leader
> >>> is
> >>> 0x1e00000000g
> >>>  WARN - Sending snapshot last zxid of peer is 0x5002dc766  zxid of
> leader
> >>> is
> >>> 0x1e00000000
> >>> WARN - Sending snapshot last zxid of peer is 0x1c00000000  zxid of
> leader
> >>> is
> >>> 0x1e00000000
> >>> ERROR - Unexpected exception causing shutdown while sock still open
> >>> java.net.SocketException: Broken pipe
> >>>        at java.net.SocketOutputStream.socketWrite0(Native Method)
> >>>        at
> >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >>>        at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >>>        at
> >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> >>>        at
> >>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
> >>>        at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
> >>>        at
> >>>
> org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
> >>>        at
> >>>
> org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116)
> >>>        at
> >>> org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167)
> >>>        at
> >>>
> >>>
> >>>
> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
> >>>        at
> >>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967)
> >>>        at
> >>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
> >>>        at
> >>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
> >>>        at
> >>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
> >>>        at
> >>> org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031)
> >>>        at
> >>>
> >>>
> >>>
> org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
> >>>        at
> >>>
> >>>
> >>>
> org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426)
> >>>        at
> >>>
> >>>
> >>>
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331)
> >>> WARN - ******* GOODBYE /10.138.34.212:33272 ********
> >>>
> >>> Avinash
> >>>
> >>>
> >
>

Reply via email to