On Wed, May 9, 2018 at 12:07 PM, Makia Minich
<[email protected]> wrote:
>> On May 9, 2018, at 10:02 AM, Michael Di Domenico <[email protected]> 
>> wrote:
>>
>> On Wed, May 9, 2018 at 9:50 AM, Makia Minich
>> <[email protected]> wrote:
>>>
>>> I have an LNET routing question. I’ve attached a quick diagram of the 
>>> current setup; but basically I have two core networks (one infiniband and 
>>> one ethernet) with a set of LNET routers in between. There is storage and 
>>> clients on both sides of these routers and all clients need to see all/most 
>>> storage. All connections, configurations, etc are all working.
>>>
>>> The question is, if an LNET router goes down (which does cause some amount 
>>> of reconnect or remapping for any clients attempting to use those routes) 
>>> would this cause any issues or delays for a client’s connection to 
>>> non-routed storage? Put slightly different, if a job on the ethernet 
>>> clients is actively using ethernet storage and the lnet routers go down, 
>>> will job be affected? What about a new job just launching when that lnet 
>>> router is down?
>>
>> just for the sake of clarity when you say "routers down" do you mean
>> all routers or just one/two?
>
> Thanks for the question, I should have made that clearer. For this question, 
> I was thinking a single router (and no fine-grained routing). I’d also 
> question what would happen if all routers are down: understood that you’d see 
> hangs for any mounts that are LNET-router based, but local-network mounts 
> “should” remain unaffected, right?


unfortunately i can't lend any advice, my theory though would be the
same as yours.  losing a single or multiple routers just cuts the
bandwidth between the client and the storage.  and losing all the
routers would stop connectivity to the remote storage from the
respective client.  i would not expect that loss to prevent
communication to other storage servers
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to