It could mean that there are network issues with that one particular client.  
If the client loses connectivity to an ost for some reason (even if the problem 
is on the client side), requests would timeout and the client would assume the 
target ost is unavailable.  The client would then try to reconnect to the 
target on the failover node, but since the target is not available on the 
failover node (because no failover occurred), I believe that node would log a 
message like what you have seen.  The fact that you see errors on multiple  
servers from the same client makes me think the problem is on the client.  
Maybe the network connection is flapping up and down?

In the example you gave, is oss010 the failover node for target fs-OST00b0?

--Rick


On 12/8/23, 9:39 AM, "lustre-discuss on behalf of Backer via lustre-discuss" 
<[email protected] 
<mailto:[email protected]> on behalf of 
[email protected] <mailto:[email protected]>> wrote:


Hi All,


Just sending this again. 




On Tue, 5 Dec 2023 at 15:03, Backer <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>> wrote:


Hi All,


Time to time, I see the following messages on multiple OSS about a particular 
client IP. What does it mean? All the OSS and OSTs are online and has been 
online in the past. 




Dec 4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not 
available for connect from <client ip>@tcp1 (no target). If you are running an 
HA pair check that the target is mounted on the other server.





















_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to