Hey folks,

Any update on this fix getting merged? We suspect other crashes based on
this bug.

Thanks,

Chris

On Tue, Jan 13, 2015 at 7:09 AM, Gregory Farnum <[email protected]> wrote:

> Awesome, thanks for the bug report and the fix, guys. :)
> -Greg
>
> On Mon, Jan 12, 2015 at 11:18 PM, 严正 <[email protected]> wrote:
> > I tracked down the bug. Please try the attached patch
> >
> > Regards
> > Yan, Zheng
> >
> >
> >
> >
> >> 在 2015年1月13日,07:40,Gregory Farnum <[email protected]> 写道:
> >>
> >> Zheng, this looks like a kernel client issue to me, or else something
> >> funny is going on with the cap flushing and the timestamps (note how
> >> the reading client's ctime is set to an even second, while the mtime
> >> is ~.63 seconds later and matches what the writing client sees). Any
> >> ideas?
> >> -Greg
> >>
> >> On Mon, Jan 12, 2015 at 12:19 PM, Lorieri <[email protected]> wrote:
> >>> Hi Gregory,
> >>>
> >>>
> >>> $ uname -a
> >>> Linux coreos2 3.17.7+ #2 SMP Tue Jan 6 08:22:04 UTC 2015 x86_64
> >>> Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz GenuineIntel GNU/Linux
> >>>
> >>>
> >>> Kernel Client, using  `mount -t ceph ...`
> >>>
> >>>
> >>> core@coreos2 /var/run/systemd/system $ modinfo ceph
> >>> filename:       /lib/modules/3.17.7+/kernel/fs/ceph/ceph.ko
> >>> license:        GPL
> >>> description:    Ceph filesystem for Linux
> >>> author:         Patience Warnick <[email protected]>
> >>> author:         Yehuda Sadeh <[email protected]>
> >>> author:         Sage Weil <[email protected]>
> >>> alias:          fs-ceph
> >>> depends:        libceph
> >>> intree:         Y
> >>> vermagic:       3.17.7+ SMP mod_unload
> >>> signer:         Magrathea: Glacier signing key
> >>> sig_key:
> D4:BB:DE:E9:C6:D8:FC:90:9F:23:59:B2:19:1B:B8:FA:57:A1:AF:D2
> >>> sig_hashalgo:   sha256
> >>>
> >>> core@coreos2 /var/run/systemd/system $ modinfo libceph
> >>> filename:       /lib/modules/3.17.7+/kernel/net/ceph/libceph.ko
> >>> license:        GPL
> >>> description:    Ceph filesystem for Linux
> >>> author:         Patience Warnick <[email protected]>
> >>> author:         Yehuda Sadeh <[email protected]>
> >>> author:         Sage Weil <[email protected]>
> >>> depends:        libcrc32c
> >>> intree:         Y
> >>> vermagic:       3.17.7+ SMP mod_unload
> >>> signer:         Magrathea: Glacier signing key
> >>> sig_key:
> D4:BB:DE:E9:C6:D8:FC:90:9F:23:59:B2:19:1B:B8:FA:57:A1:AF:D2
> >>> sig_hashalgo:   sha256
> >>>
> >>>
> >>>
> >>> ceph is installed on a ubuntu containers (same kernel):
> >>>
> >>> $ dpkg -l |grep ceph
> >>>
> >>> ii  ceph                             0.87-1trusty
> >>> amd64        distributed storage and file system
> >>> ii  ceph-common                      0.87-1trusty
> >>> amd64        common utilities to mount and interact with a ceph
> >>> storage cluster
> >>> ii  ceph-fs-common                   0.87-1trusty
> >>> amd64        common utilities to mount and interact with a ceph file
> >>> system
> >>> ii  ceph-fuse                        0.87-1trusty
> >>> amd64        FUSE-based client for the Ceph distributed file system
> >>> ii  ceph-mds                         0.87-1trusty
> >>> amd64        metadata server for the ceph distributed file system
> >>> ii  libcephfs1                       0.87-1trusty
> >>> amd64        Ceph distributed file system client library
> >>> ii  python-ceph                      0.87-1trusty
> >>> amd64        Python libraries for the Ceph distributed filesystem
> >>>
> >>>
> >>>
> >>> Reproducing the error:
> >>>
> >>> at machine 1:
> >>> core@coreos1 /var/lib/deis/store/logs $ > test.log
> >>> core@coreos1 /var/lib/deis/store/logs $ echo 1 > test.log
> >>> core@coreos1 /var/lib/deis/store/logs $ stat test.log
> >>>  File: 'test.log'
> >>>  Size: 2         Blocks: 1          IO Block: 4194304 regular file
> >>> Device: 0h/0d Inode: 1099511629882  Links: 1
> >>> Access: (0644/-rw-r--r--)  Uid: (  500/    core)   Gid: (  500/
> core)
> >>> Access: 2015-01-12 20:05:03.000000000 +0000
> >>> Modify: 2015-01-12 20:06:09.637234229 +0000
> >>> Change: 2015-01-12 20:06:09.637234229 +0000
> >>> Birth: -
> >>>
> >>> at machine 2:
> >>> core@coreos2 /var/lib/deis/store/logs $ stat test.log
> >>>  File: 'test.log'
> >>>  Size: 2         Blocks: 1          IO Block: 4194304 regular file
> >>> Device: 0h/0d Inode: 1099511629882  Links: 1
> >>> Access: (0644/-rw-r--r--)  Uid: (  500/    core)   Gid: (  500/
> core)
> >>> Access: 2015-01-12 20:05:03.000000000 +0000
> >>> Modify: 2015-01-12 20:06:09.637234229 +0000
> >>> Change: 2015-01-12 20:06:09.000000000 +0000
> >>> Birth: -
> >>>
> >>>
> >>> Change time is not updated making some tail libs to not show new
> >>> content until you force the change time be updated, like running a
> >>> "touch" in the file.
> >>> Some tools freeze and trigger other issues in the system.
> >>>
> >>>
> >>> Tests, all in the machine #2:
> >>>
> >>> FAILED -> https://github.com/ActiveState/tail
> >>> FAILED -> /usr/bin/tail of a Google docker image running debian wheezy
> >>> PASSED -> /usr/bin/tail of a ubuntu 14.04 docker image
> >>> PASSED -> /usr/bin/tail of the coreos release 494.5.0
> >>>
> >>>
> >>> Tests in machine #1 (same machine that is writing the file) all tests
> pass.
> >>>
> >>>
> >>>
> >>> On Mon, Jan 12, 2015 at 5:14 PM, Gregory Farnum <[email protected]>
> wrote:
> >>>> What versions of all the Ceph pieces are you using? (Kernel
> >>>> client/ceph-fuse, MDS, etc)
> >>>>
> >>>> Can you provide more details on exactly what the program is doing on
> >>>> which nodes?
> >>>> -Greg
> >>>>
> >>>> On Fri, Jan 9, 2015 at 5:15 PM, Lorieri <[email protected]> wrote:
> >>>>> first 3 stat commands shows blocks and size changing, but not the
> times
> >>>>> after a touch it changes and tail works
> >>>>>
> >>>>> I saw some cephfs freezes related to it, it came back after touching
> the files
> >>>>>
> >>>>> coreos2 logs # stat deis-router.log
> >>>>>  File: 'deis-router.log'
> >>>>>  Size: 148564     Blocks: 291        IO Block: 4194304 regular file
> >>>>> Device: 0h/0d Inode: 1099511628780  Links: 1
> >>>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/
> root)
> >>>>> Access: 2015-01-10 01:13:00.100582619 +0000
> >>>>> Modify: 2015-01-10 01:13:00.100582619 +0000
> >>>>> Change: 2015-01-10 01:13:00.000000000 +0000
> >>>>> Birth: -
> >>>>> coreos2 logs # stat deis-router.log
> >>>>>  File: 'deis-router.log'
> >>>>>  Size: 152633     Blocks: 299        IO Block: 4194304 regular file
> >>>>> Device: 0h/0d Inode: 1099511628780  Links: 1
> >>>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/
> root)
> >>>>> Access: 2015-01-10 01:13:00.100582619 +0000
> >>>>> Modify: 2015-01-10 01:13:00.100582619 +0000
> >>>>> Change: 2015-01-10 01:13:00.000000000 +0000
> >>>>> Birth: -
> >>>>> coreos2 logs # stat deis-router.log
> >>>>>  File: 'deis-router.log'
> >>>>>  Size: 155763     Blocks: 305        IO Block: 4194304 regular file
> >>>>> Device: 0h/0d Inode: 1099511628780  Links: 1
> >>>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/
> root)
> >>>>> Access: 2015-01-10 01:13:00.100582619 +0000
> >>>>> Modify: 2015-01-10 01:13:00.100582619 +0000
> >>>>> Change: 2015-01-10 01:13:00.000000000 +0000
> >>>>> Birth: -
> >>>>>
> >>>>> coreos2 logs # touch deis-router.log
> >>>>>
> >>>>> coreos2 logs # stat deis-router.log
> >>>>>  File: 'deis-router.log'
> >>>>>  Size: 155763     Blocks: 305        IO Block: 4194304 regular file
> >>>>> Device: 0h/0d Inode: 1099511628780  Links: 1
> >>>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/
> root)
> >>>>> Access: 2015-01-10 01:13:46.961858103 +0000
> >>>>> Modify: 2015-01-10 01:13:46.961858103 +0000
> >>>>> Change: 2015-01-10 01:13:46.000000000 +0000
> >>>>> Birth: -
> >>>>>
> >>>>> On Fri, Jan 9, 2015 at 11:11 PM, Lorieri <[email protected]> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have a program that tails a file and this file is create on
> another machine
> >>>>>>
> >>>>>> some tail programs does not work because the modification time is
> not
> >>>>>> updated in the remote machines
> >>>>>>
> >>>>>> I've find this old thread
> >>>>>> http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11001
> >>>>>>
> >>>>>> it mentions the problem and suggest ntp sync
> >>>>>>
> >>>>>> I tried to re-sync ntp and restart the ceph cluster, but the issue
> persists
> >>>>>>
> >>>>>> do you know if it is possible to avoid this behavior ?
> >>>>>>
> >>>>>> thanks
> >>>>>> -lorieri
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> [email protected]
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to