MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
At Mon, 24 Dec 2012 15:53:02 +0800,
Liu Yuan wrote:
On 12/24/2012 03:51 PM, Hitoshi Mitake wrote:
Do you mean file names should be keeper.c and the name of executable
file should be keeper?
Yes, I think so. how do you think
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
At Mon, 1 Oct 2012 13:53:55 +0100,
Chris Webb wrote:
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Chris Webb wrote:
I remember that loopback iscsi and nbd are very prone to deadlock under
memory pressure
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Chris Webb wrote:
I remember that loopback iscsi and nbd are very prone to deadlock under
memory pressure, because more dirty pages need to be created to be able to
progress with writing out the existing ones. Presumably a kernel
A colleague and I have been discussing the possibility of using Sheepdog for
the storage backing physical hosts as well as qemu virtual machines. It
feels like it wouldn't be particularly hard to take the relatively simple
qemu - sheep protocol defined in qemu/block/sheepdog.c and write a kernel
Liu Yuan namei.u...@gmail.com writes:
On 07/20/2012 02:55 PM, Dietmar Maurer wrote:
[brief maintenance on a node causes automatic recovery]
Such large amount of data utilizes the network for 100% until the
rebooted node comes up again.
That is expected behavior?
Yes, for now.
If you're going to do this, might it be better to test for the existence of
the typical obj, journal, epoch directories rather than mandate the
existence of a magic file with a long name and an ugly underscore?
However, it still feels like you should just be doing
[ -d /store/journal ] sheep
Dietmar Maurer diet...@proxmox.com writes:
If you're going to do this, might it be better to test for the existence
of the typical obj, journal, epoch directories rather than mandate the
existence of a magic file with a long name and an ugly underscore?
That file should be a 'unique
Dietmar Maurer diet...@proxmox.com writes:
normally people mount sheepdog data disks in fstab. That way the disks
get mounted at boot. For example, we have the following mounts:
/dev/sda == /
/dev/sdb == /var/lib/sheepdog
The problem occurs when mounting the sheepdog data disk fails
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
If you specify zero to the number of virtual nodes, no data will not
be stored to the node.
This looks great to me: gives us a nice way to deal with varying disk sizes
across a cluster.
If I set -v 0, presumably the store directory is
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Sorry, I couldn't reproduce it. I'll keep in mind this problem, but I
think of releasing 0.3.0 because it seems that the fatal blocking
problem doesn't happen if you use corosync 1.3.x.
Hi. Sorry for the slow response. Yes, I think that's
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
I've sent some fixes related to network failure. Can you try with the
devel branch again?
Hi. I've just retried with this updated version.
When I ran with corosync-1.4.2, the remaining cluster just hung (apparently
forever) without ever
Perhaps something like this would fit the bill?
-- 8 --
Subject: [PATCH] Don't report an error for blocks not stored locally
Signed-off-by: Chris Webb ch...@arachsys.com
---
sheep/simple_store.c |8 +---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/sheep
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Chris Webb wrote:
If the failed node is just partitioned away from the rest of the cluster
rather than failing, what's supposed to happen to the sheep instances and
the qemus on it? I saw operations hang indefinitely, which
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
This patchset makes collie's I/Os like QEMU's ones, and adds support
for automatic retry of collie commands.
Chris, can you try the devel branch?
Hi. This seems to work very nicely. I made a three node cluster, created and
started writing
At the moment, I think that an IO operation from a failed disk will make the
corresponding sheep call leave_cluster(), dropping into a gateway mode where
it forwards IO operations for the qemu processes attached to it, but doesn't
store data any more, and presumably isn't considered part of the
Chris Webb ch...@arachsys.com writes:
The changes apply to lib/* and to sheep/*, but do not affect the collie tool,
for which I'll prepare a follow-up series.
Sorry it's taken me so long to follow up on this, but I now have a final
patch to finish this process off. I've been through
as 'VDI' not 'vdi' nor 'Vdi' in messages and comments.
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/cluster.c | 36 ++
collie/collie.c | 61 +++-
collie/common.c | 20 +++---
collie/node.c| 16 ++---
collie/treeview.c
Signed-off-by: Chris Webb ch...@arachsys.com
---
sheep/cluster/accord.c|2 +-
sheep/cluster/zookeeper.c |2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/sheep/cluster/accord.c b/sheep/cluster/accord.c
index 337f631..a685f9e 100644
--- a/sheep/cluster/accord.c
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
I couldn't reproduce this. On my environment, the last 3 nodes
stopped correctly with a network partition error. Perhaps, is this a
corosync problem?
Could be. I'm running corosync 1.3.0 here. I'll grab the latest head of
corosync git
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
I finished all of the cluster driver implementation we planed, so I
think of releasing 0.3.0 this weekend. If you have pending patches
for 0.3.0, please send them until Nov 18th. I'll spend this week
testing Sheepdog.
Hi. I thought I
This is a completely trivial patch series which just addresses things like
variation between capitalized and non-capitalized messages sent to syslog,
standardizing different wordings and styles for the same error message,
clarifying wording, and so on. It should have no functional effects
Signed-off-by: Chris Webb ch...@arachsys.com
---
sheep/cluster/corosync.c | 14 +++---
sheep/group.c| 18 +-
sheep/sdnet.c|2 +-
sheep/store.c|4 ++--
sheep/vdi.c | 16
5 files changed, 27
We standardize on wording of the form 'failed to ...: ' followed by the
strerror() message, and where possible, we follow the bulk of the existing
code in using %m rather than %s with strerror(errno).
Signed-off-by: Chris Webb ch...@arachsys.com
---
lib/event.c | 14 +++---
lib
Signed-off-by: Chris Webb ch...@arachsys.com
---
sheep/sheep.c | 56 +++-
1 files changed, 27 insertions(+), 29 deletions(-)
diff --git a/sheep/sheep.c b/sheep/sheep.c
index 833a5cb..920495a 100644
--- a/sheep/sheep.c
+++ b/sheep/sheep.c
Signed-off-by: Chris Webb ch...@arachsys.com
---
lib/event.c |4 ++--
lib/logger.c | 15 +++
lib/net.c|2 +-
sheep/cluster/corosync.c | 27 +--
sheep/group.c| 40
Signed-off-by: Chris Webb ch...@arachsys.com
---
include/logger.h |4 ++--
lib/logger.c |8
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/logger.h b/include/logger.h
index 461f2d9..bcc9c26 100644
--- a/include/logger.h
+++ b/include/logger.h
@@ -47,8
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
At Sat, 29 Oct 2011 11:10:29 +0100,
Chris Webb wrote:
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
+const char *shmfile = /tmp/sheepdog_shm;
[...]
+ shmfd = open(shmfile, O_CREAT | O_RDWR, 0644);
Even though
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
+const char *shmfile = /tmp/sheepdog_shm;
[...]
+ shmfd = open(shmfile, O_CREAT | O_RDWR, 0644);
Even though this is just a testing driver, this does make me a little bit
nervous. For instance, as a malicious user of your machine, I
Hi Kazutaka. I pulled these fixes (your devel branch is master + these fixes
at the moment) and rebuilt. However, I'm afraid I'm still seeing flaky
operation.
I do a successful
collie vdi create 7fa6adbe-2551-4a60-ab50-c901c972d11d 539545600
then a bunch of successful setattr, getattr and
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Yes, I pushed many patches which simplify cluster communications, so
the problem might be solved with the current master branch. Anyway,
I'll try to find what caused the problem. :)
Hi Kazutaka. I pulled the current head of master,
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Thanks for your testing! There was a trivial bug in collie/vdi.c.
I've sent a patch and pushed it to vdiattr branch. It may solve all
of your problems.
Hi. Sorry for the slow reply to this. I've been testing a newly built
checkout from
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Thanks, the reason of this problem is that you use a direct I/O option
but the offset and length of collie vdi write is not aligned to
sector size (512 bytes). I didn't expect that because VM's I/O
requests are always sector aligned.
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Yes, as long as setattr -x is run on the same machine. Note that
Sheepdog object storage doesn't allow concurrent accesses from
multiple machines.
Hi Kazutaka. For this to apply to setattr -x makes the exclusiveness of the
operation much
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
At Thu, 13 Oct 2011 22:00:05 +0900,
MORITA Kazutaka wrote:
At Thu, 13 Oct 2011 13:35:06 +0100,
Chris Webb wrote:
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Sheepdog uses a corosync multicast for all global
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Chris Webb wrote:
Hi Kazutaka. Just double-checking, but is there a race here where the id is
allocated but the key isn't written yet, i.e. a getattr on another host
could see a value for the attribute but that value is an empty string
Hi. We've finished porting our infrastructure management system to live
entirely on top of Sheepdog, and have begun doing some testing as a result.
We use setattr -x to implement locking in the way we've previously
discussed, and I've noticed a few consistency problems.
Here's a first, simple
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/vdi.c |9 -
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/collie/vdi.c b/collie/vdi.c
index 070539c..e6c32fc 100644
--- a/collie/vdi.c
+++ b/collie/vdi.c
@@ -1012,8 +1012,15 @@ static int vdi_read(int argc, char
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/vdi.c |4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/collie/vdi.c b/collie/vdi.c
index 204e13a..6b2f26c 100644
--- a/collie/vdi.c
+++ b/collie/vdi.c
@@ -1113,7 +1113,9 @@ static int vdi_write(int argc, char
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/vdi.c |9 -
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/collie/vdi.c b/collie/vdi.c
index 6b2f26c..ef386ea 100644
--- a/collie/vdi.c
+++ b/collie/vdi.c
@@ -1097,8 +1097,15 @@ static int vdi_write(int argc, char
Usage:
collie vdi read vdiname [offset [len]] [-a address] [-p port] [-h]
If len is not specified, we write from offset to the end of the vdi or EOF
on STDIN, whichever is reached first. If offset is also not specified, we
write from the start of the vdi.
Signed-off-by: Chris Webb ch
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
collie/vdi.c| 209 --
Hi. I think there might be a missing patch here, since this won't apply
because there isn't a collie/vdi.c in the current git tree?
Best wishes,
Chris.
--
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Thank you for your testing. I've implemented another approach.
Could you try again?
Hi Kazutaka. Yes, this one works fine, and survives the deletion of
snapshots nicely.
The vdi id can change if we get the snapshot, so there is a problem
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Sheepdog also uses a hash value of the vdi name to look up vdi
objects, so it is already difficult to implement vdi rename simply,
but I think it is not impossible. For example, if we log all the vdi
rename operations, we can traverse the
Another operation I could do with exposing to end users from the new
Sheepdog branch of our platform is drive read and write. From our existing
drives, we provide API calls to
- read N bytes (or the remainder of the drive) starting at offset M
- write bytes starting at offset M
as well as
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
At Wed, 3 Aug 2011 13:50:32 +0100,
Chris Webb wrote:
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Sheepdog also uses a hash value of the vdi name to look up vdi
objects, so it is already difficult to implement vdi
Chris Webb ch...@arachsys.com writes:
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
Let me clarify some points.
In this approach, when we rename a vdi, we specify the vdi 'id' and
change the attribute of the vdi, yes?
What is specified in the qemu command line option
Previously, a general EXIT_FAILURE was returned in this case, which is hard to
distinguish from other cluster failures.
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/collie.c |6 ++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/collie/collie.c b/collie/collie.c
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
I don't know the way to include tabs or newlines into a program
argument string. I think it is enough to handle only ' ' and '\'.
In the end, I backslash quoted all whitespace in the patch you merged,
because it does appear to be possible
.
However, is it possible to make this and other VDI operations properly
constant- or log-time in the number of VDIs instead of linear?
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/collie.c | 58 +-
1 files changed, 35 insertions
Chris Webb ch...@arachsys.com writes:
@@ -387,8 +396,9 @@ static void print_vdi_list(uint32_t vid, char *name, char
*tag, uint32_t snapid,
We need at least
- char vdi_size_str[8], my_objs_str[8], cow_objs_str[8];
+ char vdi_size_str[16], my_objs_str[16], cow_objs_str[16
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/collie.c |4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/collie/collie.c b/collie/collie.c
index 606cb30..f253bbc 100644
--- a/collie/collie.c
+++ b/collie/collie.c
@@ -1180,7 +1180,7 @@ static int cluster_parser
and attempt to return these consistently for all collie commands.
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/collie.c | 94 --
include/exits.h | 12 +++
2 files changed, 61 insertions(+), 45 deletions(-)
create mode 100644
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
@@ -928,12 +932,13 @@ reread:
if (ret) {
if (ret == SD_RES_VDI_EXIST) {
fprintf(stderr, the attribute already exists, %s\n,
key);
+ return EXIT_BUSY;
Is EXIT_BUSY suitable
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
At Thu, 16 Jun 2011 17:01:40 +0100,
Chris Webb wrote:
} else if (ret == SD_RES_NO_OBJ) {
fprintf(stderr, no such attribute, %s\n, key);
} else
and attempt to return these consistently for all collie commands.
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/collie.c | 97 +--
include/exits.h | 12 +++
2 files changed, 63 insertions(+), 46 deletions(-)
create mode
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
@@ -387,8 +396,9 @@ static void print_vdi_list(uint32_t vid, char *name,
char *tag, uint32_t snapid,
size_to_str(cow_objs * SD_DATA_OBJ_SIZE, cow_objs_str,
sizeof(cow_objs_str));
if (!data || strcmp(name, data) == 0) {
Signed-off-by: Chris Webb ch...@arachsys.com
---
collie/collie.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/collie/collie.c b/collie/collie.c
index e0ecc78..606cb30 100644
--- a/collie/collie.c
+++ b/collie/collie.c
@@ -900,7 +900,7 @@ static int vdi_setattr(int
Haven ha...@thehavennet.org.uk writes:
Running the same test here on the virtual I'm getting:
524288000 bytes (524 MB) copied, 15.2004 s, 34.5 MB/s
Running that on the underlying drive of one of the cluster I get:
524288000 bytes (524 MB) copied, 7.54742 s, 69.5 MB/s
Yes, that's much more
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
I'm not familiar with btrfs mount options, but if a raw image show a
good performance on the same file system, I think this is a problem of
Sheepdog. To be honest, I don't have the slightest idea why Sheepdog
shows such a bad results in
Hi. I'm looking at both Sheepdog and Ceph at the moment, and thinking about
future directions for our hosting product. We run qemu-kvm virtual machines
backed by LVM2 logical volumes as virtual drives, accessed either locally or
over iscsi. I'm thinking of migrating in time to a distributed block
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes:
On 12/04/2009 10:22 PM, Chris Webb wrote:
I wonder how expensive it actually is to take and release the lock now?
Potentially it could already be quite cheap if corosync performs well and
given that dog is now in C...
According
In the spirit of merging dog and sheep, do you think it would also be worth
pulling the sheepdog client code into the sheepdog tree instead of putting
it directly into qemu/block/sheepdog.c?
If sd_open, sd_aio_readv, c were in a small libsheepdog rather than being
part of qemu, other programs
62 matches
Mail list logo