At Fri, 20 Jul 2012 17:12:21 +,
Dietmar Maurer wrote:
after some more thinking, this is also wrong. I would be really great to
have some documentation about that? How does read/write during recovery
works exactly.
I first though we can simply reject reads. For writes, we can reject
So when we shutdown a node all other node starts object recovery
immediately?
Ok, let explain me by a simple example:
- 3 nodes with 1TB disk space, --copies 2
- 50% used
Now I want to install a new kernel on one node, so I need to reboot, which
takes about 3 minute.
At reboot, when
On 07/20/2012 02:55 PM, Dietmar Maurer wrote:
Ok, let explain me by a simple example:
- 3 nodes with 1TB disk space, --copies 2
- 50% used
Now I want to install a new kernel on one node, so I need to reboot, which
takes about 3 minute.
At reboot, when sheepdog is stopped, both
Liu Yuan namei.u...@gmail.com writes:
On 07/20/2012 02:55 PM, Dietmar Maurer wrote:
[brief maintenance on a node causes automatic recovery]
Such large amount of data utilizes the network for 100% until the
rebooted node comes up again.
That is expected behavior?
Yes, for now.
On 07/20/2012 03:33 PM, Chris Webb wrote:
Liu Yuan namei.u...@gmail.com writes:
On 07/20/2012 02:55 PM, Dietmar Maurer wrote:
[brief maintenance on a node causes automatic recovery]
Such large amount of data utilizes the network for 100% until the
rebooted node comes up again.
That is
On 07/20/2012 04:59 PM, Dietmar Maurer wrote:
re-balance always involves massive network traffic. So IMHO this must be
manually triggered.
If someone want do that automatically he can write a script
No, most of time, we actually need automatic recovery, because Sheepdog
is targeted for
On 07/20/2012 04:59 PM, Dietmar Maurer wrote:
re-balance always involves massive network traffic. So IMHO this must be
manually triggered.
If someone want do that automatically he can write a script
No, most of time, we actually need automatic recovery, because Sheepdog is
targeted for
On 07/20/2012 05:18 PM, Dietmar Maurer wrote:
I fully understand that. You have thousands of nodes and unlimited
network bandwith.
Unfortunately, corosync only support 16 node (official limit
supported by redhat), and most of our users will run less than 5
nodes, and use GB links. So an
Partly yes, but I think an option to disable automatic recovery temporarily
would be better because I just need it at some time not all the time. Suppose
one node out of your reach is down and you need hours to get to it to do
manual recovery, this will put cluster to dangerous for hours, even
On 07/20/2012 06:04 PM, Dietmar Maurer wrote:
Partly yes, but I think an option to disable automatic recovery temporarily
would be better because I just need it at some time not all the time. Suppose
one node out of your reach is down and you need hours to get to it to do
manual recovery, this
On 07/20/2012 07:09 PM, Dietmar Maurer wrote:
Maybe we can delay the start of recovery for some time (1h)? That way
a normal server reboot does not harm.
Then how do you handle IOs routed to the down node if you don't recover
the membership state?
like 'recovery in process'? It simply
I meant IOs from VMs, you can't simple delay the recovery process. For e.g,
object OBJ has three copies in node A, B, C. Suppose that B is down, before it
start up again, how do you handle requests on OBJ? and when B is back, how
do you handle off the updates back to B for OBJ?
How do you
On 07/20/2012 07:28 PM, Dietmar Maurer wrote:
I meant IOs from VMs, you can't simple delay the recovery process. For e.g,
object OBJ has three copies in node A, B, C. Suppose that B is down, before
it
start up again, how do you handle requests on OBJ? and when B is back, how
do you handle
For now, we will get a 1) epoch_mismatch error, or 2) OBJ is being recovery
for this case and gateway node will retry the request when the epoch
matches (1), targeted node will re-queue the request locally when OBJ is
recoveried (2).
Let's assume a complete recovery takes about 2 hours. Does
On 07/20/2012 08:50 PM, Dietmar Maurer wrote:
For now, we will get a 1) epoch_mismatch error, or 2) OBJ is being recovery
for this case and gateway node will retry the request when the epoch
matches (1), targeted node will re-queue the request locally when OBJ is
recoveried (2).
Let's
Let's assume a complete recovery takes about 2 hours. Does that mean
my VMs are blocked for 2 hours (instead of continue operation on other
nodes)?
This is actually why we spend lots of lines in recovery and IO patch, there
are
some mechanism, such as request retry, oid scheduling
nope, at most dozens of sec as I observed
在 2012-7-20 PM10:04,Dietmar Maurer diet...@proxmox.com写道:
Let's assume a complete recovery takes about 2 hours. Does that mean
my VMs are blocked for 2 hours (instead of continue operation on other
nodes)?
This is actually why we spend lots
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
nope, at most dozens of sec as I observed
So you can complete a request before recovery is complete?
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailman/listinfo/sheepdog
yes, this is why we have so many lines
在 2012-7-20 PM10:34,Dietmar Maurer diet...@proxmox.com写道:
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
nope, at most dozens of sec as I observed
So you can complete a request before recovery is complete?
--
sheepdog mailing
Please can you point me to that code (i am unable to find the relevant
files/function)?
From: Liu Yuan [mailto:namei.u...@gmail.com]
Sent: Freitag, 20. Juli 2012 16:37
To: Dietmar Maurer
Cc: Chris Webb; sheepdog@lists.wpkg.org
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
2012 16:37
To: Dietmar Maurer
Cc: Chris Webb; sheepdog@lists.wpkg.org
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
yes, this is why we have so many lines
在 2012-7-20 PM10:34,Dietmar Maurer diet...@proxmox.com写道:
Subject: RE: [sheepdog] [PATCH] sheep: add a kill
thanks.
Im away from kbd, you can find most of the code in request.c and recovery.c
在 2012-7-20 PM10:47,Dietmar Maurer
diet...@proxmox.commailto:diet...@proxmox.com写道:
Please can you point me to that code (i am unable to find the relevant
files/function)?
--
sheepdog mailing list
Im away from kbd, you can find most of the code in request.c and recovery.c
So you basically prioritize recovery when an object is requested, and delay
recovery
of unused object?
From what I see, this can also work with the suggested 'recovery_delay' (you
can always recover
requested
That still means that IO on the KVM side is extremely slow during (2h) recovery?
From: Liu Yuan [mailto:namei.u...@gmail.com]
Sent: Freitag, 20. Juli 2012 16:18
To: Dietmar Maurer
Cc: sheepdog@lists.wpkg.org; Chris Webb
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
nope
[mailto:namei.u...@gmail.com]mailto:[mailto:namei.u...@gmail.com]
Sent: Freitag, 20. Juli 2012 16:18
To: Dietmar Maurer
Cc: sheepdog@lists.wpkg.orgmailto:sheepdog@lists.wpkg.org; Chris Webb
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
nope, at most dozens of sec as I observed
在 2012-7-20
) recovery?
From: Liu Yuan
[mailto:namei.u...@gmail.com]mailto:[mailto:namei.u...@gmail.com]
Sent: Freitag, 20. Juli 2012 16:18
To: Dietmar Maurer
Cc: sheepdog@lists.wpkg.orgmailto:sheepdog@lists.wpkg.org; Chris Webb
Subject: RE: [sheepdog] [PATCH] sheep: add a kill node operation
nope, at most dozens
This command is supposed to shut down the specified node correctly
So when we shutdown a node all other node starts object recovery immediately?
-Dietmar
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailman/listinfo/sheepdog
On 07/20/2012 01:21 PM, Dietmar Maurer wrote:
So when we shutdown a node all other node starts object recovery immediately?
Yes.
Thanks,
Yuan
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailman/listinfo/sheepdog
28 matches
Mail list logo