Hello Justin,

We just finished implementing an AOE based SAN that is
replicated between geographically separated datacenters
using DRBD proxy.  Our usage of DRBD proxy prevented
us from using a primary-primary setup.  Due to that we choose
to go with vblade because it runs a single process per
LUN, which makes the transition from primary to secondary
easier.

We are 75% through the migration to the new AOE based SAN.
Our biggest headache so far has been the corner case we've
hit.  We have vblade (direct io) exporting DRBD devices.
The disk used by the DRBD devices are LVM logical volumes.

We've seen issues where LVM pvmove will not complete and
gets stuck.  DRBD throws

    block drbd11: Local backing block device frozen?

and then the vblade process gets stuck in the D state.
We do not see the same issue if vblade is not running
(ie the LUN is secondary).  So we've been migrating
resources between the datacenters quite a bit to allow
us to do all the pvmoves needed to finish the migration
off of our old iSCSI SAN.

On Wed, Sep 28, 2011 at 12:04:53PM +0200, Justin Albstmeijer wrote:
> Hi aoe users and developers.
> 
> I'm testing a setup that consists of two storage servers, which
> replicate lvm volumes using drbd in master-master mode.
> 
> Both storage servers export the same drbd/aoe block devices using qaoed
> or ggaoed.
> 
> The aoe kernel module, on the client machines witch mount the aoe block
> devices (aoe devices are exclusively mounted on a server), load balances
> between the two storage servers.
> 
> All seems to work fine and performance is acceptable.
> 
> The only worry I have is that qaoed or ggaoed might buffer the writes
> before committing them to drbd, causing inconstancy in the replication.
> This could be a problem in normal operation, but surely if one of the
> storage servers would power-off unexpectedly without committing all it's
> writes to drbd.
> 
> Am I right to worry about this?.
> 
> Should for this reason direct-io be enabled in the qaoed or ggaoed
> configuration?. I have not tested the performance impact yet on this
> setup, but from other aoe tests I would expect a sharp decrease in
> performance.
> 
> Should I consider not exporting the same drbd/aoe on each storage server
>  or investigate if the aoe kernel module can work in fail-over mode to
> limit the possible impact of this non-committed/lost data still in the
> buffers?
> 
> Any advise/feedback is welcome.
> 
> Justin
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Aoetools-discuss mailing list
> Aoetools-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

-- 
James R. Leu
Software Architect
INOC
608.204.0203
608.663.4555 fax
j...@inoc.com
www.inoc.com

*** DELIVERING UPTIME ***

Attachment: pgpSYRXgt2oVE.pgp
Description: PGP signature

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

Reply via email to