Re: remote mirroring in the works?

2010-09-06 Thread K. Richard Pixley

 On 20100906 14:50, David Nicol wrote:

Only off-topic if BTRFS isn't ever going to ooze into the space
currently occupied by the likes of

http://en.wikipedia.org/wiki/Global_File_System

that is, file systems that have multiple nodes simultaneously
accessing block devices and tolerating faults.


There seem to be a number of other systems looking at building fault 
tolerance and distribution over btrfs: crfs, ceph, lustre.  I'm 
convinced that will happen even if btrfs doesn't do it natively.


Btrfs could probably be built to stripe and/or mirror over several nbd 
devices now, although I haven't tried it.


--rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-31 Thread Simon Kirby
On Mon, Aug 30, 2010 at 11:14:51AM -0700, K. Richard Pixley wrote:

  On 20100830 10:59, Roy Sigurd Karlsbakk wrote:
 I think drbd does precisely what you want.

 It's not useful for fault tolerance, nor for load balancing, but it
 will
 produce a remote block copy that can be used as a sort of hot
 backup.
 drbd with heartbeat/pacemaker can provide fault tolerance...
 I think that's a matter of semantics.

 Once you've failed over from the primary system to the secondary,  
 changes to your block device are terminal.  It's not easy to produce a  
 system which can manage those changes and heal in the sense of  
 allowing the primary system to return to service.  In effect, returning  
 the primary system to service requires taking both systems down and  
 copying the block device from the secondary back to the first.

This is totally incorrect.  DRBD replicates in both directions quite
well, in fact.  I've been using it on about 60 machines for many years,
and I have never had to do what you mention.

What it does not help with is avoiding corruption that occurs above the
block layer; eg, if your file system or your database on top of it barfs,
there is no other good copy.  fsck or repair is still required in these
cases.  It is just like local RAID 1 in this respect -- you still need a
backup and/or copy at the file level, which is closer to what is needed
here.

Simon-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-31 Thread Simon Kirby
On Tue, Aug 31, 2010 at 07:07:29AM +0200, Fred van Zwieten wrote:

 Hmmm, maybe, but rsync would take a lot of time to find the changes.
 the actual blocks of a snap _are_ the changes, that's why SnapMirror
 is very efficient. And, I don't see how rsync will retain the snap's
 between both sites. It would be great if a tool like rsync could have
 access to the changed blocks in a snap. Don't know if btrfs exposes
 these somehow.

rsync doesn't have the hinting required to do this efficiently.  It has
to scan the whole thing every time it is run, and isn't anything like a
continuous replication in this respect.  Also, We've had problems in the
past with very large file systems causing rsync to run out of memory,
because it builds a file list in memory.  This lead us to build a cpbk
tool that basically did the same thing without file listsm, which turned
out to be a piece of crap, so some other guy kindly rewrote it, but he
unfortunately missed the original point entirely and rewrote it using
file lists.  Sigh.

Anyway, there _is_ this interface:

btrfs subvolume find-new path last_gen
List the recently modified files in a filesystem.

Eg:

btrfs sub find-new /mnt 0

This should print all files on the file system, and the last transaction
ID marker.  This can be used to call the interface again, which lists
only new changed things since that ID.

So, it might be pretty easy to glue these tools together, for now, until
something does this automatically and/or in some more efficient or
low-level way.

Simon-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-31 Thread Goffredo Baroncelli
On Tuesday, 31 August, 2010, Simon Kirby wrote:
[...]
 Anyway, there _is_ this interface:
 
   btrfs subvolume find-new path last_gen
   List the recently modified files in a filesystem.
 
 Eg:
 
   btrfs sub find-new /mnt 0
 
 This should print all files on the file system, and the last transaction
 ID marker.  This can be used to call the interface again, which lists
 only new changed things since that ID.
  

It is not fully correct. In fact Chris Mason says

(from http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04620.html)


Chris When we find an inode in the output, it doesn't mean that inode has
Chris changed.  It just means the btree block holding that inode has 
changed.
Chris So we'll want to add limiting based on the ctime/mtime of the inode as
Chris well.

So even tough this command definitely helps, false positives may happen. 
And moreover an empty file is not detected (I think because the file doesn't 
have associated data). But I think that this may be easily corrected.

Regards
G.Baroncelli

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-31 Thread Fred van Zwieten
Thinking about this a bit more, would a setup with btrfs on top of
DRBD be a setup that comes in the neighboorhood of what SnapMirror
provides? DRBD does replication at the blocklevel, without any notion
of a filesystem on top of it (as I understand this). So, if I make a
snapshot on a DRBD'ed btrfs filesystem, this snapshot would also get
replicated at the DRBD level. Provided I put the DB in a consisted
state before making the snap, I have a remote consistent copy of this
DB. This copy can be used as a failover target or as a basis for
restore.

Am I correct?


On Tue, Aug 31, 2010 at 8:30 AM, Simon Kirby s...@hostway.ca wrote:
 On Mon, Aug 30, 2010 at 11:14:51AM -0700, K. Richard Pixley wrote:

  On 20100830 10:59, Roy Sigurd Karlsbakk wrote:
 I think drbd does precisely what you want.

 It's not useful for fault tolerance, nor for load balancing, but it
 will
 produce a remote block copy that can be used as a sort of hot
 backup.
 drbd with heartbeat/pacemaker can provide fault tolerance...
 I think that's a matter of semantics.

 Once you've failed over from the primary system to the secondary,
 changes to your block device are terminal.  It's not easy to produce a
 system which can manage those changes and heal in the sense of
 allowing the primary system to return to service.  In effect, returning
 the primary system to service requires taking both systems down and
 copying the block device from the secondary back to the first.

 This is totally incorrect.  DRBD replicates in both directions quite
 well, in fact.  I've been using it on about 60 machines for many years,
 and I have never had to do what you mention.

 What it does not help with is avoiding corruption that occurs above the
 block layer; eg, if your file system or your database on top of it barfs,
 there is no other good copy.  fsck or repair is still required in these
 cases.  It is just like local RAID 1 in this respect -- you still need a
 backup and/or copy at the file level, which is closer to what is needed
 here.

 Simon-
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Fred van Zwieten
Hi there,

I would like to know if there is something functionally equivalent to
NetApp's SnapMirror in the works or planning? It would require block
level access to a snap and the ability to rebuild (subvolumes
including it's) snap's on another machine.

If not, what would be the best way to build something more or less
equivalent using existing tools? rsync-ing a snap seems the same, but
it isn't. First of all it 's file based, not very nice for DB's, and
you don't get the snap's on the other side the same.

Fred
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Bryan Whitehead
LVM Snapshot.

lvm -s -n SnapShotName /dev/VolumeGroup/SourceLogicalVolumeName

you may need to pass -l or -L to give an initial size for the COW.

(as for rebuilding on another machine, that would require shared
storage or additional LVM tricks to export/import - or good old
fashioned dd)

that said, a more appropriate list to question is linux-...@redhat.com

On Mon, Aug 30, 2010 at 10:07 AM, Fred van Zwieten fvzwie...@gmail.com wrote:
 Hi there,

 I would like to know if there is something functionally equivalent to
 NetApp's SnapMirror in the works or planning? It would require block
 level access to a snap and the ability to rebuild (subvolumes
 including it's) snap's on another machine.

 If not, what would be the best way to build something more or less
 equivalent using existing tools? rsync-ing a snap seems the same, but
 it isn't. First of all it 's file based, not very nice for DB's, and
 you don't get the snap's on the other side the same.

 Fred
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Roy Sigurd Karlsbakk
- Original Message -
 Hi there,
 
 I would like to know if there is something functionally equivalent to
 NetApp's SnapMirror in the works or planning? It would require block
 level access to a snap and the ability to rebuild (subvolumes
 including it's) snap's on another machine.
 
 If not, what would be the best way to build something more or less
 equivalent using existing tools? rsync-ing a snap seems the same, but
 it isn't. First of all it 's file based, not very nice for DB's, and
 you don't get the snap's on the other side the same.

Perhaps DRBD - see http://www.drbd.org/ - that'll mirror the block device(s) on 
which btrfs resides.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread K. Richard Pixley

 On 20100830 10:07, Fred van Zwieten wrote:

Hi there,

I would like to know if there is something functionally equivalent to
NetApp's SnapMirror in the works or planning? It would require block
level access to a snap and the ability to rebuild (subvolumes
including it's) snap's on another machine.

If not, what would be the best way to build something more or less
equivalent using existing tools? rsync-ing a snap seems the same, but
it isn't. First of all it 's file based, not very nice for DB's, and
you don't get the snap's on the other side the same.

Fred

I think drbd does precisely what you want.

It's not useful for fault tolerance, nor for load balancing, but it will 
produce a remote block copy that can be used as a sort of hot backup.


You can also do something very similar by combining LVM, (the logical 
volume manager), with LVM snapshots and NBD, (the network block device) 
by mirroring to an NBD device.


Neither of these approaches can tolerate the remote file system being 
live until and unless it takes over for the primary.  But either can 
maintain a dynamic remote block device.


--rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Roy Sigurd Karlsbakk
 I think drbd does precisely what you want.
 
 It's not useful for fault tolerance, nor for load balancing, but it
 will
 produce a remote block copy that can be used as a sort of hot
 backup.

drbd with heartbeat/pacemaker can provide fault tolerance...

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread K. Richard Pixley

 On 20100830 10:59, Roy Sigurd Karlsbakk wrote:

I think drbd does precisely what you want.

It's not useful for fault tolerance, nor for load balancing, but it
will
produce a remote block copy that can be used as a sort of hot
backup.

drbd with heartbeat/pacemaker can provide fault tolerance...

I think that's a matter of semantics.

Once you've failed over from the primary system to the secondary, 
changes to your block device are terminal.  It's not easy to produce a 
system which can manage those changes and heal in the sense of 
allowing the primary system to return to service.  In effect, returning 
the primary system to service requires taking both systems down and 
copying the block device from the secondary back to the first.


In terms of fault tolerance, I'd call this a tolerance of about a half a 
fault since the system cannot return to it's initial configuration 
without breaking continuity of service.


And there really isn't any way to extend this. It's not fault tolerance 
in the virtual synchrony sense where there can be a pool of N machines, 
all symmetric, which can tolerate N - 1 failures and produce continuing 
service throughout.


It's also not load balanced in the virtual synchrony sense where N 
machines can all be in service concurrently and the service can tolerate 
N - 1 failures, albeit at degraded performance.  Or in the sense where 
failed servers can return to the group dynamically.


It's not sufficient for any application in which I've ever sought fault 
tolerance.  If it's sufficient for you, that's great.  But my definition 
of fault tolerance requires that the system be capable of returning to 
it's initial state without loss of service.  The heartbeat approach with 
single failover can't do that.


--rich - who is likely now off topic.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Fred van Zwieten
I just glanced over the DRBD/LVM combi, but I don't see it being
functionally equal to SnapMirror. Let me (try to) explain how
snapmirror works:

On system A there is a volume (vol1). We let this vol1(A) replicate
thru SnapMirror to vol1(B). This is done by creating a snap vol1sx(A)
and replicate all changed blocks between this snapshot (x) and the
previous snapshot (x-1). The first time, there is no x-1 and the whole
volume will be replicated, but after this initial full copy, only
the changed blocks between the two snapshot's are being replicated to
system B. This is also called snap based replication. Why we want
this? Easy. To support consistent DB snap's. The proces works by first
putting the DB in a consistent mode (depends on DB implementation),
create a snapshot, let the DB continue, replicate the changes. This
way a DB consistent state will be replicated. The cool thing about the
NetApp implementation is that on system B the snap's (x, x-1, x-2,
etc) are also available. When there is trouble, you can choose to
online the DB on system B on any of the snap's, or, even cooler, to
replicate one of those snap's back to system A, doing a block based
rollback at the filesystem level.

Fred

On Mon, Aug 30, 2010 at 7:55 PM, K. Richard Pixley r...@noir.com wrote:
  On 20100830 10:07, Fred van Zwieten wrote:

 Hi there,

 I would like to know if there is something functionally equivalent to
 NetApp's SnapMirror in the works or planning? It would require block
 level access to a snap and the ability to rebuild (subvolumes
 including it's) snap's on another machine.

 If not, what would be the best way to build something more or less
 equivalent using existing tools? rsync-ing a snap seems the same, but
 it isn't. First of all it 's file based, not very nice for DB's, and
 you don't get the snap's on the other side the same.

 Fred

 I think drbd does precisely what you want.

 It's not useful for fault tolerance, nor for load balancing, but it will
 produce a remote block copy that can be used as a sort of hot backup.

 You can also do something very similar by combining LVM, (the logical volume
 manager), with LVM snapshots and NBD, (the network block device) by
 mirroring to an NBD device.

 Neither of these approaches can tolerate the remote file system being live
 until and unless it takes over for the primary.  But either can maintain a
 dynamic remote block device.

 --rich

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Freddie Cash
On Mon, Aug 30, 2010 at 2:15 PM, Fred van Zwieten fvzwie...@gmail.com wrote:
 I just glanced over the DRBD/LVM combi, but I don't see it being
 functionally equal to SnapMirror. Let me (try to) explain how
 snapmirror works:

 On system A there is a volume (vol1). We let this vol1(A) replicate
 thru SnapMirror to vol1(B). This is done by creating a snap vol1sx(A)
 and replicate all changed blocks between this snapshot (x) and the
 previous snapshot (x-1). The first time, there is no x-1 and the whole
 volume will be replicated, but after this initial full copy, only
 the changed blocks between the two snapshot's are being replicated to
 system B. This is also called snap based replication. Why we want
 this? Easy. To support consistent DB snap's. The proces works by first
 putting the DB in a consistent mode (depends on DB implementation),
 create a snapshot, let the DB continue, replicate the changes. This
 way a DB consistent state will be replicated. The cool thing about the
 NetApp implementation is that on system B the snap's (x, x-1, x-2,
 etc) are also available. When there is trouble, you can choose to
 online the DB on system B on any of the snap's, or, even cooler, to
 replicate one of those snap's back to system A, doing a block based
 rollback at the filesystem level.

In the ZFS world, this would be the zfs send and zfs recv
functionality.  In case anyone wants to read up on how it works over
there, for ideas on how it could be implemented for btrfs in the
future.

-- 
Freddie Cash
fjwc...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread K. Richard Pixley
 If you can put the db into a consistent state, then rsync will do 
this.  Rsync does changed block transfers.


--rich

On 8/30/10 14:15 , Fred van Zwieten wrote:

I just glanced over the DRBD/LVM combi, but I don't see it being
functionally equal to SnapMirror. Let me (try to) explain how
snapmirror works:

On system A there is a volume (vol1). We let this vol1(A) replicate
thru SnapMirror to vol1(B). This is done by creating a snap vol1sx(A)
and replicate all changed blocks between this snapshot (x) and the
previous snapshot (x-1). The first time, there is no x-1 and the whole
volume will be replicated, but after this initial full copy, only
the changed blocks between the two snapshot's are being replicated to
system B. This is also called snap based replication. Why we want
this? Easy. To support consistent DB snap's. The proces works by first
putting the DB in a consistent mode (depends on DB implementation),
create a snapshot, let the DB continue, replicate the changes. This
way a DB consistent state will be replicated. The cool thing about the
NetApp implementation is that on system B the snap's (x, x-1, x-2,
etc) are also available. When there is trouble, you can choose to
online the DB on system B on any of the snap's, or, even cooler, to
replicate one of those snap's back to system A, doing a block based
rollback at the filesystem level.

Fred

On Mon, Aug 30, 2010 at 7:55 PM, K. Richard Pixleyr...@noir.com  wrote:

  On 20100830 10:07, Fred van Zwieten wrote:

Hi there,

I would like to know if there is something functionally equivalent to
NetApp's SnapMirror in the works or planning? It would require block
level access to a snap and the ability to rebuild (subvolumes
including it's) snap's on another machine.

If not, what would be the best way to build something more or less
equivalent using existing tools? rsync-ing a snap seems the same, but
it isn't. First of all it 's file based, not very nice for DB's, and
you don't get the snap's on the other side the same.

Fred

I think drbd does precisely what you want.

It's not useful for fault tolerance, nor for load balancing, but it will
produce a remote block copy that can be used as a sort of hot backup.

You can also do something very similar by combining LVM, (the logical volume
manager), with LVM snapshots and NBD, (the network block device) by
mirroring to an NBD device.

Neither of these approaches can tolerate the remote file system being live
until and unless it takes over for the primary.  But either can maintain a
dynamic remote block device.

--rich


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: remote mirroring in the works?

2010-08-30 Thread Fred van Zwieten
Hmmm, maybe, but rsync would take a lot of time to find the changes.
the actual blocks of a snap _are_ the changes, that's why SnapMirror
is very efficient. And, I don't see how rsync will retain the snap's
between both sites. It would be great if a tool like rsync could have
access to the changed blocks in a snap. Don't know if btrfs exposes
these somehow.

Fred

On Tue, Aug 31, 2010 at 12:56 AM, K. Richard Pixley r...@noir.com wrote:
  If you can put the db into a consistent state, then rsync will do this.
  Rsync does changed block transfers.

 --rich

 On 8/30/10 14:15 , Fred van Zwieten wrote:

 I just glanced over the DRBD/LVM combi, but I don't see it being
 functionally equal to SnapMirror. Let me (try to) explain how
 snapmirror works:

 On system A there is a volume (vol1). We let this vol1(A) replicate
 thru SnapMirror to vol1(B). This is done by creating a snap vol1sx(A)
 and replicate all changed blocks between this snapshot (x) and the
 previous snapshot (x-1). The first time, there is no x-1 and the whole
 volume will be replicated, but after this initial full copy, only
 the changed blocks between the two snapshot's are being replicated to
 system B. This is also called snap based replication. Why we want
 this? Easy. To support consistent DB snap's. The proces works by first
 putting the DB in a consistent mode (depends on DB implementation),
 create a snapshot, let the DB continue, replicate the changes. This
 way a DB consistent state will be replicated. The cool thing about the
 NetApp implementation is that on system B the snap's (x, x-1, x-2,
 etc) are also available. When there is trouble, you can choose to
 online the DB on system B on any of the snap's, or, even cooler, to
 replicate one of those snap's back to system A, doing a block based
 rollback at the filesystem level.

 Fred

 On Mon, Aug 30, 2010 at 7:55 PM, K. Richard Pixleyr...@noir.com  wrote:

  On 20100830 10:07, Fred van Zwieten wrote:

 Hi there,

 I would like to know if there is something functionally equivalent to
 NetApp's SnapMirror in the works or planning? It would require block
 level access to a snap and the ability to rebuild (subvolumes
 including it's) snap's on another machine.

 If not, what would be the best way to build something more or less
 equivalent using existing tools? rsync-ing a snap seems the same, but
 it isn't. First of all it 's file based, not very nice for DB's, and
 you don't get the snap's on the other side the same.

 Fred

 I think drbd does precisely what you want.

 It's not useful for fault tolerance, nor for load balancing, but it will
 produce a remote block copy that can be used as a sort of hot backup.

 You can also do something very similar by combining LVM, (the logical
 volume
 manager), with LVM snapshots and NBD, (the network block device) by
 mirroring to an NBD device.

 Neither of these approaches can tolerate the remote file system being
 live
 until and unless it takes over for the primary.  But either can maintain
 a
 dynamic remote block device.

 --rich


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html