Re: [Gluster-users] I/O error for one folder within the mountpoint

Florian Leleu Fri, 07 Jul 2017 06:17:39 -0700

Thank you Ravi, after checking gfid within the brick I think someone
made modification inside the brick and not inside the mountpoint ...


Well, I'll try to fix it all it's all within my hands.

Thanks again, have a nice day.

 
Le 07/07/2017 à 12:28, Ravishankar N a écrit :
> On 07/07/2017 03:39 PM, Florian Leleu wrote:
>>
>> I guess you're right aboug gfid, I got that:
>>
>> [2017-07-07 07:35:15.197003] W [MSGID: 108008]
>> [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
>> 0-applicatif-replicate-0: GFID mismatch for
>> <gfid:3fa785b5-4242-4816-a452-97da1a5e45c6>/snooper
>> b9222041-72dd-43a3-b0ab-4169dbd9a87f on applicatif-client-1 and
>> 60056f98-20f8-4949-a4ae-81cc1a139147 on applicatif-client-0
>>
>> Can you tell me how can I fix that ?If that helps I don't mind
>> deleting the whole folder snooper, I have backup.
>>
>
> The steps listed in "Fixing Directory entry split-brain:" of
> https://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/ 
> should give you an idea. It is for files whose gfids mismatch but the
> steps are similar for directories too.
> If the contents of the snooper is same on all bricks ,  you could also
> try directly deleting the directory from one of the bricks and
> immediately doing an `ls snooper` from the mount to trigger heals to
> recreate the entries.
> Hope this helps
> Ravi
>>
>> Thanks.
>>
>>
>> Le 07/07/2017 à 11:54, Ravishankar N a écrit :
>>> What does the mount log say when you get the EIO error on snooper?
>>> Check if there is a gfid mismatch on snooper directory or the files
>>> under it for all 3 bricks. In any case the mount log or the
>>> glustershd.log of the 3 nodes for the gfids you listed below should
>>> give you some idea on why the files aren't healed.
>>> Thanks.
>>>
>>> On 07/07/2017 03:10 PM, Florian Leleu wrote:
>>>>
>>>> Hi Ravi,
>>>>
>>>> thanks for your answer, sure there you go:
>>>>
>>>> # gluster volume heal applicatif info
>>>> Brick ipvr7.xxx:/mnt/gluster-applicatif/brick
>>>> <gfid:e3b5ef36-a635-4e0e-bd97-d204a1f8e7ed>
>>>> <gfid:f8030467-b7a3-4744-a945-ff0b532e9401>
>>>> <gfid:def47b0b-b77e-4f0e-a402-b83c0f2d354b>
>>>> <gfid:46f76502-b1d5-43af-8c42-3d833e86eb44>
>>>> <gfid:d27a71d2-6d53-413d-b88c-33edea202cc2>
>>>> <gfid:7e7f02b2-3f2d-41ff-9cad-cd3b5a1e506a>
>>>> Status: Connected
>>>> Number of entries: 6
>>>>
>>>> Brick ipvr8.xxx:/mnt/gluster-applicatif/brick
>>>> <gfid:47ddf66f-a5e9-4490-8cd7-88e8b812cdbd>
>>>> <gfid:8057d06e-5323-47ff-8168-d983c4a82475>
>>>> <gfid:5b2ea4e4-ce84-4f07-bd66-5a0e17edb2b0>
>>>> <gfid:baedf8a2-1a3f-4219-86a1-c19f51f08f4e>
>>>> <gfid:8261c22c-e85a-4d0e-b057-196b744f3558>
>>>> <gfid:842b30c1-6016-45bd-9685-6be76911bd98>
>>>> <gfid:1fcaef0f-c97d-41e6-87cd-cd02f197bf38>
>>>> <gfid:9d041c80-b7e4-4012-a097-3db5b09fe471>
>>>> <gfid:ff48a14a-c1d5-45c6-a52a-b3e2402d0316>
>>>> <gfid:01409b23-eff2-4bda-966e-ab6133784001>
>>>> <gfid:c723e484-63fc-4267-b3f0-4090194370a0>
>>>> <gfid:fb1339a8-803f-4e29-b0dc-244e6c4427ed>
>>>> <gfid:056f3bba-6324-4cd8-b08d-bdf0fca44104>
>>>> <gfid:a8f6d7e5-0ff2-4747-89f3-87592597adda>
>>>> <gfid:3f6438a0-2712-4a09-9bff-d5a3027362b4>
>>>> <gfid:392c8e2f-9da4-4af8-a387-bfdfea2f404e>
>>>> <gfid:37e1edfd-9f58-4da3-8abe-819670c70906>
>>>> <gfid:15b7cdb3-aae8-4ca5-b28c-e87a3e599c9b>
>>>> <gfid:1d087e51-fb40-4606-8bb5-58936fb11a4c>
>>>> <gfid:bb0352b9-4a5e-4075-9179-05c3a5766cf4>
>>>> <gfid:40133fcf-a1fb-4d60-b169-e2355b66fb53>
>>>> <gfid:00f75963-1b4a-4d75-9558-36b7d85bd30b>
>>>> <gfid:2c0babdf-c828-475e-b2f5-0f44441fffdc>
>>>> <gfid:bbeff672-43ef-48c9-a3a2-96264aa46152>
>>>> <gfid:6c0969dd-bd30-4ba0-a7e5-ba4b3a972b9f>
>>>> <gfid:4c81ea14-56f4-4b30-8fff-c088fe4b3dff>
>>>> <gfid:1072cda3-53c9-4b95-992d-f102f6f87209>
>>>> <gfid:2e8f9f29-78f9-4402-bc0c-e63af8cf77d6>
>>>> <gfid:eeaa2765-44f4-4891-8502-5787b1310de2>
>>>> Status: Connected
>>>> Number of entries: 29
>>>>
>>>> Brick ipvr9.xxx:/mnt/gluster-applicatif/brick
>>>> <gfid:e3b5ef36-a635-4e0e-bd97-d204a1f8e7ed>
>>>> <gfid:f8030467-b7a3-4744-a945-ff0b532e9401>
>>>> <gfid:def47b0b-b77e-4f0e-a402-b83c0f2d354b>
>>>> <gfid:46f76502-b1d5-43af-8c42-3d833e86eb44>
>>>> <gfid:d27a71d2-6d53-413d-b88c-33edea202cc2>
>>>> <gfid:7e7f02b2-3f2d-41ff-9cad-cd3b5a1e506a>
>>>> Status: Connected
>>>> Number of entries: 6
>>>>
>>>>
>>>> # gluster volume heal applicatif info split-brain
>>>> Brick ipvr7.xxx:/mnt/gluster-applicatif/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Brick ipvr8.xxx:/mnt/gluster-applicatif/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Brick ipvr9.xxx:/mnt/gluster-applicatif/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Doesn't it seem odd that the first command give some different output ?
>>>>
>>>> Le 07/07/2017 à 11:31, Ravishankar N a écrit :
>>>>> On 07/07/2017 01:23 PM, Florian Leleu wrote:
>>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> first time on the ML so excuse me if I'm not following well the
>>>>>> rules, I'll improve if I get comments.
>>>>>>
>>>>>> We got one volume "applicatif" on three nodes (2 and 1 arbiter),
>>>>>> each following command was made on node ipvr8.xxx:
>>>>>>
>>>>>> # gluster volume info applicatif
>>>>>>  
>>>>>> Volume Name: applicatif
>>>>>> Type: Replicate
>>>>>> Volume ID: ac222863-9210-4354-9636-2c822b332504
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: ipvr7.xxx:/mnt/gluster-applicatif/brick
>>>>>> Brick2: ipvr8.xxx:/mnt/gluster-applicatif/brick
>>>>>> Brick3: ipvr9.xxx:/mnt/gluster-applicatif/brick (arbiter)
>>>>>> Options Reconfigured:
>>>>>> performance.read-ahead: on
>>>>>> performance.cache-size: 1024MB
>>>>>> performance.quick-read: off
>>>>>> performance.stat-prefetch: on
>>>>>> performance.io-cache: off
>>>>>> transport.address-family: inet
>>>>>> performance.readdir-ahead: on
>>>>>> nfs.disable: off
>>>>>>
>>>>>> # gluster volume status applicatif
>>>>>> Status of volume: applicatif
>>>>>> Gluster process                             TCP Port  RDMA Port 
>>>>>> Online  Pid
>>>>>> ------------------------------------------------------------------------------
>>>>>> Brick ipvr7.xxx:/mnt/gluster-applicatif/
>>>>>> brick                                       49154     0         
>>>>>> Y       2814
>>>>>> Brick ipvr8.xxx:/mnt/gluster-applicatif/
>>>>>> brick                                       49154     0         
>>>>>> Y       2672
>>>>>> Brick ipvr9.xxx:/mnt/gluster-applicatif/
>>>>>> brick                                       49154     0         
>>>>>> Y       3424
>>>>>> NFS Server on localhost                     2049      0         
>>>>>> Y       26530
>>>>>> Self-heal Daemon on localhost               N/A       N/A       
>>>>>> Y       26538
>>>>>> NFS Server on ipvr9.xxx                  2049      0         
>>>>>> Y       12238
>>>>>> Self-heal Daemon on ipvr9.xxx            N/A       N/A       
>>>>>> Y       12246
>>>>>> NFS Server on ipvr7.xxx                  2049      0         
>>>>>> Y       2234
>>>>>> Self-heal Daemon on ipvr7.xxx            N/A       N/A       
>>>>>> Y       2243
>>>>>>  
>>>>>> Task Status of Volume applicatif
>>>>>> ------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> The volume is mounted with autofs (nfs) in /home/applicatif and
>>>>>> one folder is "broken":
>>>>>>
>>>>>> l /home/applicatif/services/
>>>>>> ls: cannot access /home/applicatif/services/snooper: Input/output
>>>>>> error
>>>>>> total 16
>>>>>> lrwxrwxrwx  1 applicatif applicatif    9 Apr  6 15:53 config ->
>>>>>> ../config
>>>>>> lrwxrwxrwx  1 applicatif applicatif    7 Apr  6 15:54 .pwd -> ../.pwd
>>>>>> drwxr-xr-x  3 applicatif applicatif 4096 Apr 12 10:24 querybuilder
>>>>>> d?????????  ? ?          ?             ?            ? snooper
>>>>>> drwxr-xr-x  3 applicatif applicatif 4096 Jul  6 02:57 snooper_new
>>>>>> drwxr-xr-x 16 applicatif applicatif 4096 Jul  6 02:58 snooper_old
>>>>>> drwxr-xr-x  4 applicatif applicatif 4096 Jul  4 23:45 ssnooper
>>>>>>
>>>>>> I checked wether there was a heal, and it seems so:
>>>>>>
>>>>>> # gluster volume heal applicatif statistics heal-count
>>>>>> Gathering count of entries to be healed on volume applicatif has
>>>>>> been successful
>>>>>>
>>>>>> Brick ipvr7.xxx:/mnt/gluster-applicatif/brick
>>>>>> Number of entries: 8
>>>>>>
>>>>>> Brick ipvr8.xxx:/mnt/gluster-applicatif/brick
>>>>>> Number of entries: 29
>>>>>>
>>>>>> Brick ipvr9.xxx:/mnt/gluster-applicatif/brick
>>>>>> Number of entries: 8
>>>>>>
>>>>>> But actually in the brick on each server the folder "snooper" is
>>>>>> fine.
>>>>>>
>>>>>> I tried rebooting the servers, restarting gluster after killing
>>>>>> every process using it but it's not working.
>>>>>>
>>>>>> Has anyone already experienced that ? Any help would be nice.
>>>>>>
>>>>>
>>>>> Can you share the output of `gluster volume heal <volname> info`
>>>>> and `gluster volume heal <volname> info split-brain`? If the
>>>>> second command shows entries, please also share the getfattr
>>>>> output from the bricks for these files (getfattr -d -m . -e hex
>>>>> /brick/path/to/file).
>>>>> -Ravi
>>>>>>
>>>>>> Thanks a lot !
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> Cordialement,
>>>>>>
>>>>>> <http://www.cognix-systems.com/>                 
>>>>>>
>>>>>> Florian LELEU
>>>>>> Responsable Hosting, Cognix Systems
>>>>>>
>>>>>> *Rennes* | Brest | Saint-Malo | Paris
>>>>>> [email protected]
>>>>>> <mailto:[email protected]>
>>>>>>
>>>>>> Tél. : 02 99 27 75 92
>>>>>>
>>>>>>                  
>>>>>> Facebook Cognix Systems <https://www.facebook.com/cognix.systems/>
>>>>>> Twitter Cognix Systems <https://twitter.com/cognixsystems>
>>>>>> Logo Cognix Systems <http://www.cognix-systems.com/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>
>>>> -- 
>>>>
>>>> Cordialement,
>>>>
>>>> <http://www.cognix-systems.com/>           
>>>>
>>>> Florian LELEU
>>>> Responsable Hosting, Cognix Systems
>>>>
>>>> *Rennes* | Brest | Saint-Malo | Paris
>>>> [email protected]
>>>> <mailto:[email protected]>
>>>>
>>>> Tél. : 02 99 27 75 92
>>>>
>>>>                    
>>>> Facebook Cognix Systems <https://www.facebook.com/cognix.systems/>
>>>> Twitter Cognix Systems <https://twitter.com/cognixsystems>
>>>> Logo Cognix Systems <http://www.cognix-systems.com/>
>>>>
>>>
>>
>> -- 
>>
>> Cordialement,
>>
>> <http://www.cognix-systems.com/>             
>>
>> Florian LELEU
>> Responsable Hosting, Cognix Systems
>>
>> *Rennes* | Brest | Saint-Malo | Paris
>> [email protected]
>> <mailto:[email protected]>
>>
>> Tél. : 02 99 27 75 92
>>
>>                      
>> Facebook Cognix Systems <https://www.facebook.com/cognix.systems/>
>> Twitter Cognix Systems <https://twitter.com/cognixsystems>
>> Logo Cognix Systems <http://www.cognix-systems.com/>
>>
>

-- 

Cordialement,

<http://www.cognix-systems.com/>                

Florian LELEU
Responsable Hosting, Cognix Systems

*Rennes* | Brest | Saint-Malo | Paris
[email protected] <mailto:[email protected]>

Tél. : 02 99 27 75 92

                        
Facebook Cognix Systems <https://www.facebook.com/cognix.systems/>
Twitter Cognix Systems <https://twitter.com/cognixsystems>
Logo Cognix Systems <http://www.cognix-systems.com/>

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] I/O error for one folder within the mountpoint

Reply via email to