This system is going into production. I will try to replicate this problem on the next installation.
On Wed, Jan 2, 2019 at 9:25 PM Raghavendra Gowdappa <rgowd...@redhat.com> wrote: > > > On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev <isak...@gmail.com> wrote: > >> Still no JVM crushes. Is it possible that running glusterfs with >> performance options turned off for a couple of days cleared out the "stale >> metadata issue"? >> > > restarting these options, would've cleared the existing cache and hence > previous stale metadata would've been cleared. Hitting stale metadata > again depends on races. That might be the reason you are still not seeing > the issue. Can you try with enabling all perf xlators (default > configuration)? > > >> >> On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev <isak...@gmail.com> >> wrote: >> >>> The software ran with all of the options turned off over the weekend >>> without any problems. >>> I will try to collect the debug info for you. I have re-enabled the 3 >>> three options, but yet to see the problem reoccurring. >>> >>> >>> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa < >>> rgowd...@redhat.com> wrote: >>> >>>> Thanks Dmitry. Can you provide the following debug info I asked earlier: >>>> >>>> * strace -ff -v ... of java application >>>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >>>> mounting). >>>> >>>> regards, >>>> Raghavendra >>>> >>>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isak...@gmail.com> >>>> wrote: >>>> >>>>> These 3 options seem to trigger both (reading zip file and renaming >>>>> files) problems. >>>>> >>>>> Options Reconfigured: >>>>> performance.io-cache: off >>>>> performance.stat-prefetch: off >>>>> performance.quick-read: off >>>>> performance.parallel-readdir: off >>>>> *performance.readdir-ahead: on* >>>>> *performance.write-behind: on* >>>>> *performance.read-ahead: on* >>>>> performance.client-io-threads: off >>>>> nfs.disable: on >>>>> transport.address-family: inet >>>>> >>>>> >>>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isak...@gmail.com> >>>>> wrote: >>>>> >>>>>> Turning a single option on at a time still worked fine. I will keep >>>>>> trying. >>>>>> >>>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>>>>> messages. Do you suppose these issues are triggered by the new >>>>>> environment >>>>>> or did not exist in 4.1.5? >>>>>> >>>>>> [root@node1 ~]# glusterfs --version >>>>>> glusterfs 4.1.5 >>>>>> >>>>>> On AWS using >>>>>> [root@node1 ~]# hostnamectl >>>>>> Static hostname: node1 >>>>>> Icon name: computer-vm >>>>>> Chassis: vm >>>>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>>>>> Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>>>>> Virtualization: kvm >>>>>> Operating System: CentOS Linux 7 (Core) >>>>>> CPE OS Name: cpe:/o:centos:centos:7 >>>>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>>>>> Architecture: x86-64 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>>>>> rgowd...@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isak...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Ok. I will try different options. >>>>>>>> >>>>>>>> This system is scheduled to go into production soon. What version >>>>>>>> would you recommend to roll back to? >>>>>>>> >>>>>>> >>>>>>> These are long standing issues. So, rolling back may not make these >>>>>>> issues go away. Instead if you think performance is agreeable to you, >>>>>>> please keep these xlators off in production. >>>>>>> >>>>>>> >>>>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>>>>>> rgowd...@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev < >>>>>>>>> isak...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Raghavendra, >>>>>>>>>> >>>>>>>>>> Thank for the suggestion. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am suing >>>>>>>>>> >>>>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version >>>>>>>>>> glusterfs 5.0 >>>>>>>>>> >>>>>>>>>> On >>>>>>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl >>>>>>>>>> Icon name: computer-vm >>>>>>>>>> Chassis: vm >>>>>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>>>>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>>>>>> Virtualization: vmware >>>>>>>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>>>>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>>>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>>>>>>> Architecture: x86-64 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have configured the following options >>>>>>>>>> >>>>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info >>>>>>>>>> Volume Name: gv0 >>>>>>>>>> Type: Replicate >>>>>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>>>>>>> Status: Started >>>>>>>>>> Snapshot Count: 0 >>>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>>> Transport-type: tcp >>>>>>>>>> Bricks: >>>>>>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>>>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>>>>>>> Options Reconfigured: >>>>>>>>>> performance.io-cache: off >>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>> performance.quick-read: off >>>>>>>>>> performance.parallel-readdir: off >>>>>>>>>> performance.readdir-ahead: off >>>>>>>>>> performance.write-behind: off >>>>>>>>>> performance.read-ahead: off >>>>>>>>>> performance.client-io-threads: off >>>>>>>>>> nfs.disable: on >>>>>>>>>> transport.address-family: inet >>>>>>>>>> >>>>>>>>>> I don't know if it is related, but I am seeing a lot of >>>>>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>>>>>>> operation failed [No such device or address] >>>>>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>>>> dispatch >>>>>>>>>> handler >>>>>>>>>> >>>>>>>>> >>>>>>>>> These msgs were introduced by patch [1]. To the best of my >>>>>>>>> knowledge they are benign. We'll be sending a patch to fix these msgs >>>>>>>>> though. >>>>>>>>> >>>>>>>>> +Mohit Agrawal <moagr...@redhat.com> +Milind Changire >>>>>>>>> <mchan...@redhat.com> . Can you try to identify why we are seeing >>>>>>>>> these messages? If possible please send a patch to fix this. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>>>>>>>> >>>>>>>>> >>>>>>>>>> And java.io exceptions trying to rename files. >>>>>>>>>> >>>>>>>>> >>>>>>>>> When you see the errors is it possible to collect, >>>>>>>>> * strace of the java application (strace -ff -v ...) >>>>>>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>>>>>>>> mounting)? >>>>>>>>> >>>>>>>>> I also need another favour from you. By trail and error, can you >>>>>>>>> point out which of the many performance xlators you've turned off is >>>>>>>>> causing the issue? >>>>>>>>> >>>>>>>>> The above two data-points will help us to fix the problem. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thank You, >>>>>>>>>> Dmitry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>>>>>>>> rgowd...@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> What version of glusterfs are you using? It might be either >>>>>>>>>>> * a stale metadata issue. >>>>>>>>>>> * inconsistent ctime issue. >>>>>>>>>>> >>>>>>>>>>> Can you try turning off all performance xlators? If the issue is >>>>>>>>>>> 1, that should help. >>>>>>>>>>> >>>>>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev < >>>>>>>>>>> isak...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>>>>>>>> That did not help. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev < >>>>>>>>>>>> isak...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The core file generated by JVM suggests that it happens >>>>>>>>>>>>> because the file is changing while it is being read - >>>>>>>>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>>>>>>>> The application reads in the zipfile and goes through the zip >>>>>>>>>>>>> entries, then reloads the file and goes the zip entries again. >>>>>>>>>>>>> It does so >>>>>>>>>>>>> 3 times. The application never crushes on the 1st cycle but >>>>>>>>>>>>> sometimes >>>>>>>>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>>>>>>>> The zip file is generated about 20 seconds prior to it being >>>>>>>>>>>>> used and is not updated or even used by any other application. I >>>>>>>>>>>>> have >>>>>>>>>>>>> never seen this problem on a plain file system. >>>>>>>>>>>>> >>>>>>>>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>>>>>>>> issue. I can change the source code of the java application. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Dmitry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>> Gluster-users@gluster.org >>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>> >>>>>>>>>>>
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users