Hi Malcolm,

Thank! That's a very good point. I've almost forgot that we should prefer cross 
failover configuration, rather than the individual failover.

So, using the same connection type is we should prefer (to cover failover + 
load distribution).

--
Regards,
Jeevan.

From: Cowe, Malcolm J [mailto:[email protected]]
Sent: 15 March 2016 15:24
To: Jeevan Behara Patnaik (GIS) <[email protected]>; 
[email protected]
Subject: RE: [lustre-discuss] Lustre failover configuration - Need help in 
selecting storage

One of the reasons to use the same connection type for the IO path to the 
storage is to ensure consistency in performance regardless of which server the 
storage is mounted on. However, there is another reason for using a symmetrical 
IO path: Lustre systems are designed to distribute the IO workload across 
multiple servers in parallel, maximising the available throughput across the 
network and the disk IO and each server delivering the same level of 
performance.

The servers are usually configured into building blocks of paired clusters for 
HA (that is, 2 servers attached to one or more shared DAS arrays). The storage 
is split into multiple LUNs, with half of the LUNs presented to one server, 
half to the other server in the pair. This means that each server is able to 
transact IO, each server has the same performance characteristics and there are 
no idle or passive servers. Maximum utilisation and consistent performance 
across the all the servers in the network.

For example, if you have 2 OSS servers (oss1, oss2) connected to a 60 disk tray 
split into 6 RAID 6 (8+2) LUNs, then 3 LUNs would be primary targets on oss1, 3 
on oss2, and you'd allow the LUNs to migrate on failover. This way, each of the 
servers is active on the network, maximising the available throughput 
performance of the file system. Similarly, the MGT and MDT are commonly paired 
into an metadata server pair.

If you create an imbalance in the performance of each IO path, then the 2nd 
server is going to end up as a passive node only, rather than being another 
server to scale out the bandwidth.

I've attached some example diagrams (apologies to the list for the additional 
120KB or so - not sure the list will accept it actually :) ), that highlights 
at a very high level a fairly well used pattern for the metadata and OSS 
servers for HA. Just pictures, but enough to get an idea of what Lustre is 
about. Where costs are a concern, one can also investigate use of JBODs, 
although they add their own complexity with regard to storage management 
(identifying failed disks, etc.). ZFS is gaining popularity as a storage 
platform but has its own challenges as well.

Malcolm Cowe
High Performance Data Division
Intel Corporation | www.intel.com<http://www.intel.com>

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]]
Sent: Tuesday, March 15, 2016 6:18 PM
To: [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>; Cowe, 
Malcolm J
Subject: RE: [lustre-discuss] Lustre failover configuration - Need help in 
selecting storage

Thanks Ben and Malcolm,

Yes, now I have an idea what to do. I thought multiport DAS that could share a 
single storage on two servers is hard to find. Also, if there is any cost 
concern, we can still use one directly attached Primary node and Network 
attached Failover node.

--
Regards,
Jeevan.

From: Cowe, Malcolm J [mailto:[email protected]]
Sent: 15 March 2016 01:58
To: Jeevan Behara Patnaik (GIS) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: RE: [lustre-discuss] Lustre failover configuration - Need help in 
selecting storage

Why not use a multi-ported direct attached storage (DAS) enclosure? Performance 
is retained and configuration is straightforward. There are a number of such 
enclosures available from a range of vendors, many of whom have solutions that 
have been qualified with Lustre.

Malcolm Cowe
High Performance Data Division
Intel Corporation | www.intel.com<http://www.intel.com>


From: Ben Evans [mailto:[email protected]]
Sent: 14 March 2016 19:15
To: Jeevan Behara Patnaik (GIS) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [lustre-discuss] Lustre failover configuration - Need help in 
selecting storage

You'll only go as fast as your slowest piece.

With that in mind, First figure out what sorts of bandwidth you can actually 
get across your chosen network type (per server).  That will dictate how fast 
you want your storage to be.  Benchmark it, make sure you can get the I/O over 
the wire that you think you can for that one server.

Next, find a disk system that can deliver that speed for you (you'll be able to 
get some of the info, but you'll want to benchmark that as well, with different 
RAID configurations, settings, etc.).  You may want to overprovision storage 
speed, since you probably won't be getting ideal throughput numbers.

As to redundancy, there are a number of direct-attach systems that allow you to 
connect two servers to the same set of disks.  You don't need (or really want) 
anything fancy like a SAN.

Given the cost/performance ratios, you might also experiment with a few smaller 
OSTs made up of SSDs, or using something like flashcache on the MDT(s).

-Ben Evans

From: lustre-discuss 
<[email protected]<mailto:[email protected]>>
 on behalf of "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, March 14, 2016 at 8:36 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [lustre-discuss] Lustre failover configuration - Need help in 
selecting storage



We need storage specifically for HPC Lustre failover setup, where it is must 
that two servers should share same block level storage to have failover 
configuration.

With very limited knowledge on hardware, I have the below understanding:
*         NAS can be used for shared storage, but there will be bottleneck for 
speed due to intermediate network.
*         SAN can be used, but it is costly to implement the solution and not 
really needed for Storage of 50-100TB.
*         If at all we find multiple iscsi ports to the storage enclosure, the 
storage can be used only by splitting i.e., works as two storage devices and 
the same storage can't be used by both the
systems. (And one thing to remind here, in the lustre setup, both the servers 
would be only attached, but only one will be used (not sure, how it is 
possible, again need to check on this).
*         Having two virtual machines may be how we can do it. But, then, it is 
not really helpful for the purpose of failover, as the physical machine would 
be only one.

But, while posting the question, I am thinking, maybe we can compromise on 
speed in NAS, if we try having one directly attached server (primary) and the 
other attached via network (failover), so we face slowness only when the 
primary stops working.

As I posted the similar question on Server Fault: 
http://serverfault.com/questions/763569/is-it-possible-to-have-a-directly-attached-shared-storage-accessed-at-block-lev,
 I have got the following response:
"Have you actually attempted to set up a proof of concept, or at least looked 
through the documentation<http://doc.lustre.org/lustre_manual.xhtml>? Lustre 
really doesn't care very much how you connect to the underlying storage, so you 
can do whatever gets you the bandwidth you need."

So, is it true that we don't need to worry about bandwidth of the storage 
server?

I mean, for example: the communication as I understood is as follows:

==>  Client <----> MGS (Ethernet)

==>  MGS <----> MGT (Direct/ISCSI)

==>  MGS <----> MDS (Ethernet/Internal Communication)

==>  MDS <----> MDT (Direct/ISCSI/Ethernet)

==>  MDS <----> OSS (Ethernet)

==>  OSS <----> OST (Direct/ISCSI/Ethernet)

==>  OST <----> Client (Ethernet)

Does it mean that, the performance won't be affected at any stage, if iscsi is 
replaced by Ethernet or by using limited bandwidth?








[WNC_Logo]--
Thanks and Regards,
Jeevan Patnaik B| Project Engineer
Nokia IT - HEE Platform | WIPRO Technologies - Hyderabad
Mob: +91-9000607181| Off: +91-4030970347.

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com<http://www.wipro.com>
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com<http://www.wipro.com>
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to