Makia,

When I tested this similar configuration using routed and non-routed clients, what I found out is all clients, routed and non-routed, should have the same lnet.conf parameters defined, minus the specific ip2nets and routes params.  When non-routed clients did not have the same settings as routed clients, I found out during my testing the IO would hang when failing a router node, which effected the non-routed clients.  But when all clients had the same parameters, and did the same exact test, IO would continue for both routed and non-routed.  I even tested bringing down all router nodes, the non-routed clients were not effected.

The other aspect I tested was 1MB and 4MB RPC, for all clients, which all have to be the same, but I have not tested 16MB RPC yet. I also had the same client tuning across all clients, routed and non-routed.

I suggest you test this configuration on a small setup, if you can, to verify before production use.

Thanks.

jnf

--
John Fragalla
Senior Storage Engineer
High Performance Computing
Cray Inc.
[email protected] <mailto:[email protected]>
+1-951-258-7629

On 5/9/18 6:50 AM, Makia Minich wrote:
Hello all,

I have an LNET routing question. I’ve attached a quick diagram of the current setup; but basically I have two core networks (one infiniband and one ethernet) with a set of LNET routers in between. There is storage and clients on both sides of these routers and all clients need to see all/most storage. All connections, configurations, etc are all working.

The question is, if an LNET router goes down (which does cause some amount of reconnect or remapping for any clients attempting to use those routes) would this cause any issues or delays for a client’s connection to non-routed storage? Put slightly different, if a job on the ethernet clients is actively using ethernet storage and the lnet routers go down, will job be affected? What about a new job just launching when that lnet router is down?

In addition, what does “check_routers_before_use” actually do and does it change the scenarios I mentioned? (e.g. If an ethernet client has “check_routers_before_use” would every file request start with a ping to the routers even if it’s not leaving it’s core network?)

Thanks!


—

Makia Minich
Principal Architect
System Fabric Works
"Fabric Computing that Works”

"Oh, I don't know. I think everything is just as it should be, y'know?”
- Frank Fairfield



_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to