Makia,
When I tested this similar configuration using routed and non-routed
clients, what I found out is all clients, routed and non-routed, should
have the same lnet.conf parameters defined, minus the specific ip2nets
and routes params. When non-routed clients did not have the same
settings as routed clients, I found out during my testing the IO would
hang when failing a router node, which effected the non-routed clients.
But when all clients had the same parameters, and did the same exact
test, IO would continue for both routed and non-routed. I even tested
bringing down all router nodes, the non-routed clients were not effected.
The other aspect I tested was 1MB and 4MB RPC, for all clients, which
all have to be the same, but I have not tested 16MB RPC yet. I also had
the same client tuning across all clients, routed and non-routed.
I suggest you test this configuration on a small setup, if you can, to
verify before production use.
Thanks.
jnf
--
John Fragalla
Senior Storage Engineer
High Performance Computing
Cray Inc.
[email protected] <mailto:[email protected]>
+1-951-258-7629
On 5/9/18 6:50 AM, Makia Minich wrote:
Hello all,
I have an LNET routing question. I’ve attached a quick diagram of the
current setup; but basically I have two core networks (one infiniband
and one ethernet) with a set of LNET routers in between. There is
storage and clients on both sides of these routers and all clients
need to see all/most storage. All connections, configurations, etc are
all working.
The question is, if an LNET router goes down (which does cause some
amount of reconnect or remapping for any clients attempting to use
those routes) would this cause any issues or delays for a client’s
connection to non-routed storage? Put slightly different, if a job on
the ethernet clients is actively using ethernet storage and the lnet
routers go down, will job be affected? What about a new job just
launching when that lnet router is down?
In addition, what does “check_routers_before_use” actually do and does
it change the scenarios I mentioned? (e.g. If an ethernet client has
“check_routers_before_use” would every file request start with a ping
to the routers even if it’s not leaving it’s core network?)
Thanks!
—
Makia Minich
Principal Architect
System Fabric Works
"Fabric Computing that Works”
"Oh, I don't know. I think everything is just as it should be, y'know?”
- Frank Fairfield
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org