Hi Nithya I tried attching the logs but it was tool big. So I have put it on one drive accessible by everyone
https://drive.google.com/drive/folders/1744WcOfrqe_e3lRPxLpQ-CBuXHp_o44T?usp=sharing I am attaching rebalance-logs which is for the period when I ran fix-layout after adding new disk and then started remove-disk option. All of the nodes have atleast 8 TB disk available /dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick001 /dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick002 /dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick003 /dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick004 /dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick005 /dev/sdb 80T 67T 14T 83% /glusteratlas/brick006 /dev/sdb 37T 1.6T 35T 5% /glusteratlas/brick007 /dev/sdb 89T 15T 75T 17% /glusteratlas/brick008 /dev/sdb 89T 14T 76T 16% /glusteratlas/brick009 brick007 is the one I am removing gluster volume info Volume Name: atlasglust Type: Distribute Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b Status: Started Snapshot Count: 0 Number of Bricks: 9 Transport-type: tcp Bricks: Brick1: pplxgluster01**:/glusteratlas/brick001/gv0 Brick2: pplxgluster02.**:/glusteratlas/brick002/gv0 Brick3: pplxgluster03.**:/glusteratlas/brick003/gv0 Brick4: pplxgluster04.**:/glusteratlas/brick004/gv0 Brick5: pplxgluster05.**:/glusteratlas/brick005/gv0 Brick6: pplxgluster06.**:/glusteratlas/brick006/gv0 Brick7: pplxgluster07.**:/glusteratlas/brick007/gv0 Brick8: pplxgluster08.**:/glusteratlas/brick008/gv0 Brick9: pplxgluster09.**:/glusteratlas/brick009/gv0 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet auth.allow: *** features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.md-cache-timeout: 600 performance.parallel-readdir: off performance.cache-size: 1GB performance.client-io-threads: on cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 performance.cache-invalidation: on diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING Thanks On Mon, Feb 4, 2019 at 11:37 AM Nithya Balachandran <[email protected]> wrote: > Hi, > > > On Mon, 4 Feb 2019 at 16:39, mohammad kashif <[email protected]> > wrote: > >> Hi Nithya >> >> Thanks for replying so quickly. It is very much appreciated. >> >> There are lots if " [No space left on device] " errors which I can not >> understand as there are much space on all of the nodes. >> > > This means that Gluster could not find sufficient space for the file. > Would you be willing to share your rebalance log file? > Please provide the following information: > > - The gluster version > - The gluster volume info for the volume > - How full are the individual bricks for the volume? > > > >> A little bit of background will be useful in this case. I had cluster of >> seven nodes of varying capacity(73, 73, 73, 46, 46, 46,46 TB) . The >> cluster was almost 90% full so every node has almost 8 to 15 TB free >> space. I added two new nodes with 100TB each and ran fix-layout which >> completed successfully. >> >> After that I started remove-brick operation. I don't think that any >> point , any of the nodes were 100% full. Looking at my ganglia graph, there >> is minimum 5TB always available at every node. >> >> I was keeping an eye on remove-brick status and for very long time there >> was no failures and then at some point these 17000 failures appeared and it >> stayed like that. >> >> Thanks >> >> Kashif >> >> >> >> >> >> Let me explain a little bit of background. >> >> >> On Mon, Feb 4, 2019 at 5:09 AM Nithya Balachandran <[email protected]> >> wrote: >> >>> Hi, >>> >>> The status shows quite a few failures. Please check the rebalance logs >>> to see why that happened. We can decide what to do based on the errors. >>> Once you run a commit, the brick will no longer be part of the volume >>> and you will not be able to access those files via the client. >>> Do you have sufficient space on the remaining bricks for the files on >>> the removed brick? >>> >>> Regards, >>> Nithya >>> >>> On Mon, 4 Feb 2019 at 03:50, mohammad kashif <[email protected]> >>> wrote: >>> >>>> Hi >>>> >>>> I have a pure distributed gluster volume with nine nodes and trying to >>>> remove one node, I ran >>>> gluster volume remove-brick atlasglust >>>> nodename:/glusteratlas/brick007/gv0 start >>>> >>>> It completed but with around 17000 failures >>>> >>>> Node Rebalanced-files size scanned failures >>>> skipped status run time in h:m:s >>>> --------- ----------- >>>> ----------- ----------- ----------- ----------- >>>> ------------ -------------- >>>> nodename 4185858 27.5TB 6746030 >>>> 17488 0 completed 405:15:34 >>>> >>>> I can see that there is still 1.5 TB of data on the node which I was >>>> trying to remove. >>>> >>>> I am not sure what to do now? Should I run remove-brick command again >>>> so the files which has been failed can be tried again? >>>> >>>> or should I run commit first and then try to remove node again? >>>> >>>> Please advise as I don't want to remove files. >>>> >>>> Thanks >>>> >>>> Kashif >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> [email protected] >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
