Still no JVM crushes. Is it possible that running glusterfs with performance options turned off for a couple of days cleared out the "stale metadata issue"?
On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev <isak...@gmail.com> wrote: > The software ran with all of the options turned off over the weekend > without any problems. > I will try to collect the debug info for you. I have re-enabled the 3 > three options, but yet to see the problem reoccurring. > > > On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <rgowd...@redhat.com> > wrote: > >> Thanks Dmitry. Can you provide the following debug info I asked earlier: >> >> * strace -ff -v ... of java application >> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >> mounting). >> >> regards, >> Raghavendra >> >> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isak...@gmail.com> >> wrote: >> >>> These 3 options seem to trigger both (reading zip file and renaming >>> files) problems. >>> >>> Options Reconfigured: >>> performance.io-cache: off >>> performance.stat-prefetch: off >>> performance.quick-read: off >>> performance.parallel-readdir: off >>> *performance.readdir-ahead: on* >>> *performance.write-behind: on* >>> *performance.read-ahead: on* >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> >>> >>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isak...@gmail.com> >>> wrote: >>> >>>> Turning a single option on at a time still worked fine. I will keep >>>> trying. >>>> >>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>>> messages. Do you suppose these issues are triggered by the new environment >>>> or did not exist in 4.1.5? >>>> >>>> [root@node1 ~]# glusterfs --version >>>> glusterfs 4.1.5 >>>> >>>> On AWS using >>>> [root@node1 ~]# hostnamectl >>>> Static hostname: node1 >>>> Icon name: computer-vm >>>> Chassis: vm >>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>>> Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>>> Virtualization: kvm >>>> Operating System: CentOS Linux 7 (Core) >>>> CPE OS Name: cpe:/o:centos:centos:7 >>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>>> Architecture: x86-64 >>>> >>>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>>> rgowd...@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isak...@gmail.com> >>>>> wrote: >>>>> >>>>>> Ok. I will try different options. >>>>>> >>>>>> This system is scheduled to go into production soon. What version >>>>>> would you recommend to roll back to? >>>>>> >>>>> >>>>> These are long standing issues. So, rolling back may not make these >>>>> issues go away. Instead if you think performance is agreeable to you, >>>>> please keep these xlators off in production. >>>>> >>>>> >>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>>>> rgowd...@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isak...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Raghavendra, >>>>>>>> >>>>>>>> Thank for the suggestion. >>>>>>>> >>>>>>>> >>>>>>>> I am suing >>>>>>>> >>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version >>>>>>>> glusterfs 5.0 >>>>>>>> >>>>>>>> On >>>>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl >>>>>>>> Icon name: computer-vm >>>>>>>> Chassis: vm >>>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>>>> Virtualization: vmware >>>>>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>>>>> Architecture: x86-64 >>>>>>>> >>>>>>>> >>>>>>>> I have configured the following options >>>>>>>> >>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info >>>>>>>> Volume Name: gv0 >>>>>>>> Type: Replicate >>>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>>>>> Options Reconfigured: >>>>>>>> performance.io-cache: off >>>>>>>> performance.stat-prefetch: off >>>>>>>> performance.quick-read: off >>>>>>>> performance.parallel-readdir: off >>>>>>>> performance.readdir-ahead: off >>>>>>>> performance.write-behind: off >>>>>>>> performance.read-ahead: off >>>>>>>> performance.client-io-threads: off >>>>>>>> nfs.disable: on >>>>>>>> transport.address-family: inet >>>>>>>> >>>>>>>> I don't know if it is related, but I am seeing a lot of >>>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>>>>> operation failed [No such device or address] >>>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>> dispatch >>>>>>>> handler >>>>>>>> >>>>>>> >>>>>>> These msgs were introduced by patch [1]. To the best of my knowledge >>>>>>> they are benign. We'll be sending a patch to fix these msgs though. >>>>>>> >>>>>>> +Mohit Agrawal <moagr...@redhat.com> +Milind Changire >>>>>>> <mchan...@redhat.com> . Can you try to identify why we are seeing >>>>>>> these messages? If possible please send a patch to fix this. >>>>>>> >>>>>>> [1] >>>>>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>>>>>> >>>>>>> >>>>>>>> And java.io exceptions trying to rename files. >>>>>>>> >>>>>>> >>>>>>> When you see the errors is it possible to collect, >>>>>>> * strace of the java application (strace -ff -v ...) >>>>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>>>>>> mounting)? >>>>>>> >>>>>>> I also need another favour from you. By trail and error, can you >>>>>>> point out which of the many performance xlators you've turned off is >>>>>>> causing the issue? >>>>>>> >>>>>>> The above two data-points will help us to fix the problem. >>>>>>> >>>>>>> >>>>>>>> Thank You, >>>>>>>> Dmitry >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>>>>>> rgowd...@redhat.com> wrote: >>>>>>>> >>>>>>>>> What version of glusterfs are you using? It might be either >>>>>>>>> * a stale metadata issue. >>>>>>>>> * inconsistent ctime issue. >>>>>>>>> >>>>>>>>> Can you try turning off all performance xlators? If the issue is >>>>>>>>> 1, that should help. >>>>>>>>> >>>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev < >>>>>>>>> isak...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>>>>>> That did not help. >>>>>>>>>> >>>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev < >>>>>>>>>> isak...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> The core file generated by JVM suggests that it happens because >>>>>>>>>>> the file is changing while it is being read - >>>>>>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>>>>>> The application reads in the zipfile and goes through the zip >>>>>>>>>>> entries, then reloads the file and goes the zip entries again. It >>>>>>>>>>> does so >>>>>>>>>>> 3 times. The application never crushes on the 1st cycle but >>>>>>>>>>> sometimes >>>>>>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>>>>>> The zip file is generated about 20 seconds prior to it being >>>>>>>>>>> used and is not updated or even used by any other application. I >>>>>>>>>>> have >>>>>>>>>>> never seen this problem on a plain file system. >>>>>>>>>>> >>>>>>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>>>>>> issue. I can change the source code of the java application. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Dmitry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users@gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>>
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users