Hi All, With the help of gluster community and ovirt-china community...my issue got resolved...
The main root cause was the following :- 1. the glob operation takes quite a long time, longer than the ioprocess default 60s.. 2. python-ioprocess updated which makes a single change of configuration file doesn't work properly, only because this we should hack the code manually... Solution (Need to do on all the hosts) :- 1. Add the the ioprocess timeout value in the /etc/vdsm/vdsm.conf file as :- ------------ [irs] process_pool_timeout = 180 ------------- 2. Check /usr/share/vdsm/storage/outOfProcess.py, line 71 and see whether there is still "IOProcess(DEFAULT_TIMEOUT)" in it,if yes...then changing the configuration file takes no effect because now timeout is the third parameter not the second of IOProcess.__init__(). 3. Change IOProcess(DEFAULT_TIMEOUT) to IOProcess(timeout=DEFAULT_TIMEOUT) and remove the /usr/share/vdsm/storage/outOfProcess.pyc file and restart vdsm and supervdsm service on all hosts.... Thanks, Punit Dambiwal On Mon, Mar 23, 2015 at 9:18 AM, Punit Dambiwal <[email protected]> wrote: > Hi All, > > Still i am facing the same issue...please help me to overcome this issue... > > Thanks, > punit > > On Fri, Mar 20, 2015 at 12:22 AM, Thomas Holkenbrink < > [email protected]> wrote: > >> I’ve seen this before. The system thinks the storage system us up and >> running and then attempts to utilize it. >> >> The way I got around it was to put a delay in the startup of the gluster >> Node on the interface that the clients use to communicate. >> >> >> >> I use a bonded link, I then add a LINKDELAY to the interface to get the >> underlying system up and running before the network comes up. This then >> causes Network dependent features to wait for the network to finish. >> >> It adds about 10seconds to the startup time, in our environment it works >> well, you may not need as long of a delay. >> >> >> >> CentOS >> >> root@gls1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 >> >> >> >> DEVICE=bond0 >> >> ONBOOT=yes >> >> BOOTPROTO=static >> >> USERCTL=no >> >> NETMASK=255.255.248.0 >> >> IPADDR=10.10.1.17 >> >> MTU=9000 >> >> IPV6INIT=no >> >> IPV6_AUTOCONF=no >> >> NETWORKING_IPV6=no >> >> NM_CONTROLLED=no >> >> LINKDELAY=10 >> >> NAME="System Storage Bond0" >> >> >> >> >> >> >> >> >> >> Hi Michal, >> >> >> >> The Storage domain is up and running and mounted on all the host >> nodes...as i updated before that it was working perfectly before but just >> after reboot can not make the VM poweron... >> >> >> >> [image: Inline image 1] >> >> >> >> [image: Inline image 2] >> >> >> >> [root@cpu01 log]# gluster volume info >> >> >> >> Volume Name: ds01 >> >> Type: Distributed-Replicate >> >> Volume ID: 369d3fdc-c8eb-46b7-a33e-0a49f2451ff6 >> >> Status: Started >> >> Number of Bricks: 48 x 2 = 96 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: cpu01:/bricks/1/vol1 >> >> Brick2: cpu02:/bricks/1/vol1 >> >> Brick3: cpu03:/bricks/1/vol1 >> >> Brick4: cpu04:/bricks/1/vol1 >> >> Brick5: cpu01:/bricks/2/vol1 >> >> Brick6: cpu02:/bricks/2/vol1 >> >> Brick7: cpu03:/bricks/2/vol1 >> >> Brick8: cpu04:/bricks/2/vol1 >> >> Brick9: cpu01:/bricks/3/vol1 >> >> Brick10: cpu02:/bricks/3/vol1 >> >> Brick11: cpu03:/bricks/3/vol1 >> >> Brick12: cpu04:/bricks/3/vol1 >> >> Brick13: cpu01:/bricks/4/vol1 >> >> Brick14: cpu02:/bricks/4/vol1 >> >> Brick15: cpu03:/bricks/4/vol1 >> >> Brick16: cpu04:/bricks/4/vol1 >> >> Brick17: cpu01:/bricks/5/vol1 >> >> Brick18: cpu02:/bricks/5/vol1 >> >> Brick19: cpu03:/bricks/5/vol1 >> >> Brick20: cpu04:/bricks/5/vol1 >> >> Brick21: cpu01:/bricks/6/vol1 >> >> Brick22: cpu02:/bricks/6/vol1 >> >> Brick23: cpu03:/bricks/6/vol1 >> >> Brick24: cpu04:/bricks/6/vol1 >> >> Brick25: cpu01:/bricks/7/vol1 >> >> Brick26: cpu02:/bricks/7/vol1 >> >> Brick27: cpu03:/bricks/7/vol1 >> >> Brick28: cpu04:/bricks/7/vol1 >> >> Brick29: cpu01:/bricks/8/vol1 >> >> Brick30: cpu02:/bricks/8/vol1 >> >> Brick31: cpu03:/bricks/8/vol1 >> >> Brick32: cpu04:/bricks/8/vol1 >> >> Brick33: cpu01:/bricks/9/vol1 >> >> Brick34: cpu02:/bricks/9/vol1 >> >> Brick35: cpu03:/bricks/9/vol1 >> >> Brick36: cpu04:/bricks/9/vol1 >> >> Brick37: cpu01:/bricks/10/vol1 >> >> Brick38: cpu02:/bricks/10/vol1 >> >> Brick39: cpu03:/bricks/10/vol1 >> >> Brick40: cpu04:/bricks/10/vol1 >> >> Brick41: cpu01:/bricks/11/vol1 >> >> Brick42: cpu02:/bricks/11/vol1 >> >> Brick43: cpu03:/bricks/11/vol1 >> >> Brick44: cpu04:/bricks/11/vol1 >> >> Brick45: cpu01:/bricks/12/vol1 >> >> Brick46: cpu02:/bricks/12/vol1 >> >> Brick47: cpu03:/bricks/12/vol1 >> >> Brick48: cpu04:/bricks/12/vol1 >> >> Brick49: cpu01:/bricks/13/vol1 >> >> Brick50: cpu02:/bricks/13/vol1 >> >> Brick51: cpu03:/bricks/13/vol1 >> >> Brick52: cpu04:/bricks/13/vol1 >> >> Brick53: cpu01:/bricks/14/vol1 >> >> Brick54: cpu02:/bricks/14/vol1 >> >> Brick55: cpu03:/bricks/14/vol1 >> >> Brick56: cpu04:/bricks/14/vol1 >> >> Brick57: cpu01:/bricks/15/vol1 >> >> Brick58: cpu02:/bricks/15/vol1 >> >> Brick59: cpu03:/bricks/15/vol1 >> >> Brick60: cpu04:/bricks/15/vol1 >> >> Brick61: cpu01:/bricks/16/vol1 >> >> Brick62: cpu02:/bricks/16/vol1 >> >> Brick63: cpu03:/bricks/16/vol1 >> >> Brick64: cpu04:/bricks/16/vol1 >> >> Brick65: cpu01:/bricks/17/vol1 >> >> Brick66: cpu02:/bricks/17/vol1 >> >> Brick67: cpu03:/bricks/17/vol1 >> >> Brick68: cpu04:/bricks/17/vol1 >> >> Brick69: cpu01:/bricks/18/vol1 >> >> Brick70: cpu02:/bricks/18/vol1 >> >> Brick71: cpu03:/bricks/18/vol1 >> >> Brick72: cpu04:/bricks/18/vol1 >> >> Brick73: cpu01:/bricks/19/vol1 >> >> Brick74: cpu02:/bricks/19/vol1 >> >> Brick75: cpu03:/bricks/19/vol1 >> >> Brick76: cpu04:/bricks/19/vol1 >> >> Brick77: cpu01:/bricks/20/vol1 >> >> Brick78: cpu02:/bricks/20/vol1 >> >> Brick79: cpu03:/bricks/20/vol1 >> >> Brick80: cpu04:/bricks/20/vol1 >> >> Brick81: cpu01:/bricks/21/vol1 >> >> Brick82: cpu02:/bricks/21/vol1 >> >> Brick83: cpu03:/bricks/21/vol1 >> >> Brick84: cpu04:/bricks/21/vol1 >> >> Brick85: cpu01:/bricks/22/vol1 >> >> Brick86: cpu02:/bricks/22/vol1 >> >> Brick87: cpu03:/bricks/22/vol1 >> >> Brick88: cpu04:/bricks/22/vol1 >> >> Brick89: cpu01:/bricks/23/vol1 >> >> Brick90: cpu02:/bricks/23/vol1 >> >> Brick91: cpu03:/bricks/23/vol1 >> >> Brick92: cpu04:/bricks/23/vol1 >> >> Brick93: cpu01:/bricks/24/vol1 >> >> Brick94: cpu02:/bricks/24/vol1 >> >> Brick95: cpu03:/bricks/24/vol1 >> >> Brick96: cpu04:/bricks/24/vol1 >> >> Options Reconfigured: >> >> diagnostics.count-fop-hits: on >> >> diagnostics.latency-measurement: on >> >> nfs.disable: on >> >> user.cifs: enable >> >> auth.allow: 10.10.0.* >> >> performance.quick-read: off >> >> performance.read-ahead: off >> >> performance.io-cache: off >> >> performance.stat-prefetch: off >> >> cluster.eager-lock: enable >> >> network.remote-dio: enable >> >> cluster.quorum-type: auto >> >> cluster.server-quorum-type: server >> >> storage.owner-uid: 36 >> >> storage.owner-gid: 36 >> >> server.allow-insecure: on >> >> network.ping-timeout: 100 >> >> [root@cpu01 log]# >> >> >> >> ----------------------------------------- >> >> >> >> [root@cpu01 log]# gluster volume status >> >> Status of volume: ds01 >> >> Gluster process Port Online >> Pid >> >> >> ------------------------------------------------------------------------------ >> >> Brick cpu01:/bricks/1/vol1 49152 Y >> 33474 >> >> Brick cpu02:/bricks/1/vol1 49152 Y >> 40717 >> >> Brick cpu03:/bricks/1/vol1 49152 Y >> 18080 >> >> Brick cpu04:/bricks/1/vol1 49152 Y >> 40447 >> >> Brick cpu01:/bricks/2/vol1 49153 Y >> 33481 >> >> Brick cpu02:/bricks/2/vol1 49153 Y >> 40724 >> >> Brick cpu03:/bricks/2/vol1 49153 Y >> 18086 >> >> Brick cpu04:/bricks/2/vol1 49153 Y >> 40453 >> >> Brick cpu01:/bricks/3/vol1 49154 Y >> 33489 >> >> Brick cpu02:/bricks/3/vol1 49154 Y >> 40731 >> >> Brick cpu03:/bricks/3/vol1 49154 Y >> 18097 >> >> Brick cpu04:/bricks/3/vol1 49154 Y >> 40460 >> >> Brick cpu01:/bricks/4/vol1 49155 Y >> 33495 >> >> Brick cpu02:/bricks/4/vol1 49155 Y >> 40738 >> >> Brick cpu03:/bricks/4/vol1 49155 Y >> 18103 >> >> Brick cpu04:/bricks/4/vol1 49155 Y >> 40468 >> >> Brick cpu01:/bricks/5/vol1 49156 Y >> 33502 >> >> Brick cpu02:/bricks/5/vol1 49156 Y >> 40745 >> >> Brick cpu03:/bricks/5/vol1 49156 Y >> 18110 >> >> Brick cpu04:/bricks/5/vol1 49156 Y >> 40474 >> >> Brick cpu01:/bricks/6/vol1 49157 Y >> 33509 >> >> Brick cpu02:/bricks/6/vol1 49157 Y >> 40752 >> >> Brick cpu03:/bricks/6/vol1 49157 Y >> 18116 >> >> Brick cpu04:/bricks/6/vol1 49157 Y >> 40481 >> >> Brick cpu01:/bricks/7/vol1 49158 Y >> 33516 >> >> Brick cpu02:/bricks/7/vol1 49158 Y >> 40759 >> >> Brick cpu03:/bricks/7/vol1 49158 Y >> 18122 >> >> Brick cpu04:/bricks/7/vol1 49158 Y >> 40488 >> >> Brick cpu01:/bricks/8/vol1 49159 Y >> 33525 >> >> Brick cpu02:/bricks/8/vol1 49159 Y >> 40766 >> >> Brick cpu03:/bricks/8/vol1 49159 Y >> 18130 >> >> Brick cpu04:/bricks/8/vol1 49159 Y >> 40495 >> >> Brick cpu01:/bricks/9/vol1 49160 Y >> 33530 >> >> Brick cpu02:/bricks/9/vol1 49160 Y >> 40773 >> >> Brick cpu03:/bricks/9/vol1 49160 Y >> 18137 >> >> Brick cpu04:/bricks/9/vol1 49160 Y >> 40502 >> >> Brick cpu01:/bricks/10/vol1 49161 Y >> 33538 >> >> Brick cpu02:/bricks/10/vol1 49161 Y >> 40780 >> >> Brick cpu03:/bricks/10/vol1 49161 Y >> 18143 >> >> Brick cpu04:/bricks/10/vol1 49161 Y >> 40509 >> >> Brick cpu01:/bricks/11/vol1 49162 Y >> 33544 >> >> Brick cpu02:/bricks/11/vol1 49162 Y >> 40787 >> >> Brick cpu03:/bricks/11/vol1 49162 Y >> 18150 >> >> Brick cpu04:/bricks/11/vol1 49162 Y >> 40516 >> >> Brick cpu01:/bricks/12/vol1 49163 Y >> 33551 >> >> Brick cpu02:/bricks/12/vol1 49163 Y >> 40794 >> >> Brick cpu03:/bricks/12/vol1 49163 Y >> 18157 >> >> Brick cpu04:/bricks/12/vol1 49163 Y >> 40692 >> >> Brick cpu01:/bricks/13/vol1 49164 Y >> 33558 >> >> Brick cpu02:/bricks/13/vol1 49164 Y >> 40801 >> >> Brick cpu03:/bricks/13/vol1 49164 Y >> 18165 >> >> Brick cpu04:/bricks/13/vol1 49164 Y >> 40700 >> >> Brick cpu01:/bricks/14/vol1 49165 Y >> 33566 >> >> Brick cpu02:/bricks/14/vol1 49165 Y >> 40809 >> >> Brick cpu03:/bricks/14/vol1 49165 Y >> 18172 >> >> Brick cpu04:/bricks/14/vol1 49165 Y >> 40706 >> >> Brick cpu01:/bricks/15/vol1 49166 Y >> 33572 >> >> Brick cpu02:/bricks/15/vol1 49166 Y >> 40815 >> >> Brick cpu03:/bricks/15/vol1 49166 Y >> 18179 >> >> Brick cpu04:/bricks/15/vol1 49166 Y >> 40714 >> >> Brick cpu01:/bricks/16/vol1 49167 Y >> 33579 >> >> Brick cpu02:/bricks/16/vol1 49167 Y >> 40822 >> >> Brick cpu03:/bricks/16/vol1 49167 Y >> 18185 >> >> Brick cpu04:/bricks/16/vol1 49167 Y >> 40722 >> >> Brick cpu01:/bricks/17/vol1 49168 Y >> 33586 >> >> Brick cpu02:/bricks/17/vol1 49168 Y >> 40829 >> >> Brick cpu03:/bricks/17/vol1 49168 Y >> 18192 >> >> Brick cpu04:/bricks/17/vol1 49168 Y >> 40727 >> >> Brick cpu01:/bricks/18/vol1 49169 Y >> 33593 >> >> Brick cpu02:/bricks/18/vol1 49169 Y >> 40836 >> >> Brick cpu03:/bricks/18/vol1 49169 Y >> 18201 >> >> Brick cpu04:/bricks/18/vol1 49169 Y >> 40735 >> >> Brick cpu01:/bricks/19/vol1 49170 Y >> 33600 >> >> Brick cpu02:/bricks/19/vol1 49170 Y >> 40843 >> >> Brick cpu03:/bricks/19/vol1 49170 Y >> 18207 >> >> Brick cpu04:/bricks/19/vol1 49170 Y >> 40741 >> >> Brick cpu01:/bricks/20/vol1 49171 Y >> 33608 >> >> Brick cpu02:/bricks/20/vol1 49171 Y >> 40850 >> >> Brick cpu03:/bricks/20/vol1 49171 Y >> 18214 >> >> Brick cpu04:/bricks/20/vol1 49171 Y >> 40748 >> >> Brick cpu01:/bricks/21/vol1 49172 Y >> 33614 >> >> Brick cpu02:/bricks/21/vol1 49172 Y >> 40858 >> >> Brick cpu03:/bricks/21/vol1 49172 Y >> 18222 >> >> Brick cpu04:/bricks/21/vol1 49172 Y >> 40756 >> >> Brick cpu01:/bricks/22/vol1 49173 Y >> 33621 >> >> Brick cpu02:/bricks/22/vol1 49173 Y >> 40864 >> >> Brick cpu03:/bricks/22/vol1 49173 Y >> 18227 >> >> Brick cpu04:/bricks/22/vol1 49173 Y >> 40762 >> >> Brick cpu01:/bricks/23/vol1 49174 Y >> 33626 >> >> Brick cpu02:/bricks/23/vol1 49174 Y >> 40869 >> >> Brick cpu03:/bricks/23/vol1 49174 Y >> 18234 >> >> Brick cpu04:/bricks/23/vol1 49174 Y >> 40769 >> >> Brick cpu01:/bricks/24/vol1 49175 Y >> 33631 >> >> Brick cpu02:/bricks/24/vol1 49175 Y >> 40874 >> >> Brick cpu03:/bricks/24/vol1 49175 Y >> 18239 >> >> Brick cpu04:/bricks/24/vol1 49175 Y >> 40774 >> >> Self-heal Daemon on localhost N/A Y >> 33361 >> >> Self-heal Daemon on cpu05 N/A Y >> 2353 >> >> Self-heal Daemon on cpu04 N/A Y >> 40786 >> >> Self-heal Daemon on cpu02 N/A Y >> 32442 >> >> Self-heal Daemon on cpu03 N/A Y >> 18664 >> >> >> >> Task Status of Volume ds01 >> >> >> ------------------------------------------------------------------------------ >> >> Task : Rebalance >> >> ID : 5db24b30-4b9f-4b65-8910-a7a0a6d327a4 >> >> Status : completed >> >> >> >> [root@cpu01 log]# >> >> >> >> [root@cpu01 log]# gluster pool list >> >> UUID Hostname State >> >> 626c9360-8c09-480f-9707-116e67cc38e6 cpu02 Connected >> >> dc475d62-b035-4ee6-9006-6f03bf68bf24 cpu05 Connected >> >> 41b5b2ff-3671-47b4-b477-227a107e718d cpu03 Connected >> >> c0afe114-dfa7-407d-bad7-5a3f97a6f3fc cpu04 Connected >> >> 9b61b0a5-be78-4ac2-b6c0-2db588da5c35 localhost Connected >> >> [root@cpu01 log]# >> >> >> >> [image: Inline image 3] >> >> >> >> Thanks, >> >> Punit >> >> >> >> On Thu, Mar 19, 2015 at 2:53 PM, Michal Skrivanek < >> [email protected]> wrote: >> >> >> On Mar 19, 2015, at 03:18 , Punit Dambiwal <[email protected]> wrote: >> >> > Hi All, >> > >> > Is there any one have any idea about this problem...it seems it's bug >> either in Ovirt or Glusterfs...that's why no one has the idea about >> it....please correct me if i am wrong…. >> >> Hi, >> as I said, storage access times out; so it seems to me as a gluster setup >> problem, the storage domain you have your VMs on is not working… >> >> Thanks, >> michal >> >> >> > >> > Thanks, >> > Punit >> > >> > On Wed, Mar 18, 2015 at 5:05 PM, Punit Dambiwal <[email protected]> >> wrote: >> > Hi Michal, >> > >> > Would you mind to let me know the possible messedup things...i will >> check and try to resolve it....still i am communicating gluster community >> to resolve this issue... >> > >> > But in the ovirt....gluster setup is quite straight....so how come it >> will be messedup with reboot ?? if it can be messedup with reboot then it >> seems not good and stable technology for the production storage.... >> > >> > Thanks, >> > Punit >> > >> > On Wed, Mar 18, 2015 at 3:51 PM, Michal Skrivanek < >> [email protected]> wrote: >> > >> > On Mar 18, 2015, at 03:33 , Punit Dambiwal <[email protected]> wrote: >> > >> > > Hi, >> > > >> > > Is there any one from community can help me to solve this issue...?? >> > > >> > > Thanks, >> > > Punit >> > > >> > > On Tue, Mar 17, 2015 at 12:52 PM, Punit Dambiwal <[email protected]> >> wrote: >> > > Hi, >> > > >> > > I am facing one strange issue with ovirt/glusterfs....still didn't >> find this issue is related with glusterfs or Ovirt.... >> > > >> > > Ovirt :- 3.5.1 >> > > Glusterfs :- 3.6.1 >> > > Host :- 4 Hosts (Compute+ Storage)...each server has 24 bricks >> > > Guest VM :- more then 100 >> > > >> > > Issue :- When i deploy this cluster first time..it work well for >> me(all the guest VM created and running successfully)....but suddenly one >> day my one of the host node rebooted and none of the VM can boot up >> now...and failed with the following error "Bad Volume Specification" >> > > >> > > VMId :- d877313c18d9783ca09b62acf5588048 >> > > >> > > VDSM Logs :- http://ur1.ca/jxabi >> > >> > you've got timeouts while accessing storage…so I guess something got >> messed up on reboot, it may also be just a gluster misconfiguration… >> > >> > > Engine Logs :- http://ur1.ca/jxabv >> > > >> > > ------------------------ >> > > [root@cpu01 ~]# vdsClient -s 0 getVolumeInfo >> e732a82f-bae9-4368-8b98-dedc1c3814de 00000002-0002-0002-0002-000000000145 >> 6d123509-6867-45cf-83a2-6d679b77d3c5 9030bb43-6bc9-462f-a1b9-f6d5a02fb180 >> > > status = OK >> > > domain = e732a82f-bae9-4368-8b98-dedc1c3814de >> > > capacity = 21474836480 >> > > voltype = LEAF >> > > description = >> > > parent = 00000000-0000-0000-0000-000000000000 >> > > format = RAW >> > > image = 6d123509-6867-45cf-83a2-6d679b77d3c5 >> > > uuid = 9030bb43-6bc9-462f-a1b9-f6d5a02fb180 >> > > disktype = 2 >> > > legality = LEGAL >> > > mtime = 0 >> > > apparentsize = 21474836480 >> > > truesize = 4562972672 >> > > type = SPARSE >> > > children = [] >> > > pool = >> > > ctime = 1422676305 >> > > --------------------- >> > > >> > > I opened same thread earlier but didn't get any perfect answers to >> solve this issue..so i reopen it... >> > > >> > > https://www.mail-archive.com/[email protected]/msg25011.html >> > > >> > > Thanks, >> > > Punit >> > > >> > > >> > > >> > >> > >> > >> >> >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
