Milos, I just managed to take a look into a similar issue and my analysis is at [1]. I remember you mentioning about some incorrect /etc/hosts entries which lead to this same problem in earlier case, do you mind to recheck the same?
[1] http://www.gluster.org/pipermail/gluster-users/2016-December/029443.html On Wed, Dec 14, 2016 at 2:57 AM, Miloš Čučulović - MDPI <[email protected]> wrote: > Hi All, > > Moving forward with my issue, sorry for the late reply! > > I had some issues with the storage2 server (original volume), then decided > to use 3.9.0, si I have the latest version. > > For that, I synced manually all the files to the storage server. I > installed there gluster 3.9.0, started it, created new volume called > storage and all seems to work ok. > > Now, I need to create my replicated volume (add new brick on storage2 > server). Almost all the files are there. So, I was adding on storage server: > > * sudo gluter peer probe storage2 > * sudo gluster volume add-brick storage replica 2 > storage2:/data/data-cluster force > > But there I am receiving "volume add-brick: failed: Host storage2 is not > in 'Peer in Cluster' state" > > Any idea? > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: [email protected] > Skype: milos.cuculovic.mdpi > > On 08.12.2016 17:52, Ravishankar N wrote: > >> On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote: >> >>> I was able to fix the sync by rsync-ing all the directories, then the >>> hale started. The next problem :), as soon as there are files on the >>> new brick, the gluster mount will render also this one for mounts, and >>> the new brick is not ready yet, as the sync is not yet done, so it >>> results on missing files on client side. I temporary removed the new >>> brick, now I am running a manual rsync and will add the brick again, >>> hope this could work. >>> >>> What mechanism is managing this issue, I guess there is something per >>> built to make a replica brick available only once the data is >>> completely synced. >>> >> This mechanism was introduced in 3.7.9 or 3.7.10 >> (http://review.gluster.org/#/c/13806/). Before that version, you >> manually needed to set some xattrs on the bricks so that healing could >> happen in parallel while the client still would server reads from the >> original brick. I can't find the link to the doc which describes these >> steps for setting xattrs.:-( >> >> Calling it a day, >> Ravi >> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: [email protected] >>> Skype: milos.cuculovic.mdpi >>> >>> On 08.12.2016 16:17, Ravishankar N wrote: >>> >>>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote: >>>> >>>>> >>>>> >>>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> >>>>> Ah, damn! I found the issue. On the storage server, the storage2 >>>>> IP address was wrong, I inversed two digits in the /etc/hosts >>>>> file, sorry for that :( >>>>> >>>>> I was able to add the brick now, I started the heal, but still no >>>>> data transfer visible. >>>>> >>>>> 1. Are the files getting created on the new brick though? >>>> 2. Can you provide the output of `getfattr -d -m . -e hex >>>> /data/data-cluster` on both bricks? >>>> 3. Is it possible to attach gdb to the self-heal daemon on the original >>>> (old) brick and get a backtrace? >>>> `gdb -p <pid of self-heal daemon on the orignal brick>` >>>> thread apply all bt -->share this output >>>> quit gdb. >>>> >>>> >>>> -Ravi >>>> >>>>> >>>>> @Ravi/Pranith - can you help here? >>>>> >>>>> >>>>> >>>>> By doing gluster volume status, I have >>>>> >>>>> Status of volume: storage >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> ------------------------------------------------------------ >>>>> ------------------ >>>>> >>>>> Brick storage2:/data/data-cluster 49152 0 Y >>>>> 23101 >>>>> Brick storage:/data/data-cluster 49152 0 Y >>>>> 30773 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 30050 >>>>> Self-heal Daemon on storage N/A N/A Y >>>>> 30792 >>>>> >>>>> >>>>> Any idea? >>>>> >>>>> On storage I have: >>>>> Number of Peers: 1 >>>>> >>>>> Hostname: 195.65.194.217 >>>>> Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0 >>>>> State: Peer in Cluster (Connected) >>>>> >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email: [email protected] <mailto:[email protected]> >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> On 08.12.2016 13:55, Atin Mukherjee wrote: >>>>> >>>>> Can you resend the attachment as zip? I am unable to extract >>>>> the >>>>> content? We shouldn't have 0 info file. What does gluster peer >>>>> status >>>>> output say? >>>>> >>>>> On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI >>>>> <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> >>>>> wrote: >>>>> >>>>> I hope you received my last email Atin, thank you! >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email: [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> On 08.12.2016 10:28, Atin Mukherjee wrote: >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: *Atin Mukherjee* <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]>> <mailto:[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]>>>> >>>>> Date: Thu, Dec 8, 2016 at 11:56 AM >>>>> Subject: Re: [Gluster-users] Replica brick not working >>>>> To: Ravishankar N <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]>>>> >>>>> Cc: Miloš Čučulović - MDPI <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected] >>>>> >> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>>>, >>>>> Pranith Kumar Karampuri >>>>> <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>>>>, >>>>> gluster-users >>>>> <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]>> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N >>>>> <[email protected] >>>>> <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>> >>>>> <mailto:[email protected] >>>>> <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>>>> >>>>> >>>>> wrote: >>>>> >>>>> On 12/08/2016 10:43 AM, Atin Mukherjee wrote: >>>>> >>>>> >From the log snippet: >>>>> >>>>> [2016-12-07 09:15:35.677645] I [MSGID: 106482] >>>>> >>>>> [glusterd-brick-ops.c:442:__glusterd_handle_add_brick] >>>>> 0-management: Received add brick req >>>>> [2016-12-07 09:15:35.677708] I [MSGID: 106062] >>>>> >>>>> [glusterd-brick-ops.c:494:__glusterd_handle_add_brick] >>>>> 0-management: replica-count is 2 >>>>> [2016-12-07 09:15:35.677735] E [MSGID: 106291] >>>>> >>>>> [glusterd-brick-ops.c:614:__glusterd_handle_add_brick] >>>>> 0-management: >>>>> >>>>> The last log entry indicates that we hit the >>>>> code path in >>>>> gd_addbr_validate_replica_count () >>>>> >>>>> if (replica_count == >>>>> volinfo->replica_count) { >>>>> if (!(total_bricks % >>>>> volinfo->dist_leaf_count)) { >>>>> ret = 1; >>>>> goto out; >>>>> } >>>>> } >>>>> >>>>> >>>>> It seems unlikely that this snippet was hit >>>>> because we print >>>>> the E >>>>> [MSGID: 106291] in the above message only if >>>>> ret==-1. >>>>> gd_addbr_validate_replica_count() returns -1 and >>>>> yet not >>>>> populates >>>>> err_str only when in volinfo->type doesn't match >>>>> any of the >>>>> known >>>>> volume types, so volinfo->type is corrupted >>>>> perhaps? >>>>> >>>>> >>>>> You are right, I missed that ret is set to 1 here in >>>>> the above >>>>> snippet. >>>>> >>>>> @Milos - Can you please provide us the volume info >>>>> file from >>>>> /var/lib/glusterd/vols/<volname>/ from all the three >>>>> nodes to >>>>> continue >>>>> the analysis? >>>>> >>>>> >>>>> >>>>> -Ravi >>>>> >>>>> @Pranith, Ravi - Milos was trying to convert a >>>>> dist (1 X 1) >>>>> volume to a replicate (1 X 2) using add brick >>>>> and hit >>>>> this issue >>>>> where add-brick failed. The cluster is >>>>> operating with 3.7.6. >>>>> Could you help on what scenario this code path >>>>> can be >>>>> hit? One >>>>> straight forward issue I see here is missing >>>>> err_str in >>>>> this path. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ~ Atin (atinm) >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ~ Atin (atinm) >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ~ Atin (atinm) >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ~ Atin (atinm) >>>>> >>>> >>>> >>>> >> >> -- ~ Atin (atinm)
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
