Thank you very much - we will be testing this soon.

Jason

On 6/16/16, 11:11 PM, "Yan, Zheng" <uker...@gmail.com> wrote:

>On Fri, Jun 17, 2016 at 5:03 AM, Jason Gress <jgr...@accertify.com> wrote:
>> This is the latest default kernel with CentOS7.  We also tried a newer
>> kernel (from elrepo), a 4.4 that has the same problem, so I don't think
>> that is it.  Thank you for the suggestion though.
>>
>> We upgraded our cluster to the 10.2.2 release today, and it didn't
>>resolve
>> all of the issues.  It's possible that a related issue is actually
>> permissions.  Something may not be right with our config (or a bug)
>>here.
>>
>> While testing we noticed that there may actually be two issues here.  I
>>am
>> unsure, as we noticed that the most consistent way to reproduce our
>>issue
>> is to use vim or sed -i which does in place renames:
>>
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx------   1 root root 2044 Jun 16 15:50 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>> -rw-------   1 root root 2044 Jun 16 13:47 root
>> [root@ftp01 cron]# sed -i 's/^/#/' file
>> sed: cannot rename ./sedfB2CkO: Permission denied
>>
>>
>> Strangely, adding or deleting files works fine, it's only renaming that
>> fails.  And strangely I was able to successfully edit the file on ftp02:
>>
>> [root@ftp02 cron]# sed -i 's/^/#/' file
>> [root@ftp02 cron]# ls -la
>> total 3
>> drwx------   1 root root 2044 Jun 16 15:49 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>> -rw-------   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Then it worked on ftp01 this time:
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx------   1 root root 2357 Jun 16 15:49 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>> -rw-------   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Then, I vim'd it successfully on ftp01... Then ran the sed again:
>>
>> [root@ftp01 cron]# sed -i 's/^/#/' file
>> sed: cannot rename ./sedfB2CkO: Permission denied
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx------   1 root root 2044 Jun 16 15:51 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>> -rw-------   1 root root 2044 Jun 16 13:47 root
>>
>>
>> And now we have the zero file problem again:
>>
>> [root@ftp02 cron]# ls -la
>> total 2
>> drwx------   1 root root 2044 Jun 16 15:51 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root    0 Jun 16 15:50 file
>> -rw-------   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Anyway, I wonder how much of this issue is related to that cannot rename
>> issue above.  Here are our security settings:
>>
>> client.ftp01
>>         key: <redacted>
>>         caps: [mds] allow r, allow rw path=/ftp
>>         caps: [mon] allow r
>>         caps: [osd] allow rw pool=cephfs_metadata, allow rw
>>pool=cephfs_data
>> client.ftp02
>>         key: <redacted>
>>         caps: [mds] allow r, allow rw path=/ftp
>>         caps: [mon] allow r
>>         caps: [osd] allow rw pool=cephfs_metadata, allow rw
>>pool=cephfs_data
>>
>>
>> /ftp is the directory on cephfs under which cron lives; the full path is
>> /ftp/cron .
>>
>> I hope this helps and thank you for your time!
>
>I opened  ticket http://tracker.ceph.com/issues/16358. The bug is in
>path restriction code. For now, the workaround is updating client caps
>to not use path restriction.
>
>Regards
>Yan, Zheng
>
>>
>> Jason
>>
>> On 6/15/16, 4:43 PM, "John Spray" <jsp...@redhat.com> wrote:
>>
>>>On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress <jgr...@accertify.com>
>>>wrote:
>>>> While trying to use CephFS as a clustered filesystem, we stumbled
>>>>upon a
>>>> reproducible bug that is unfortunately pretty serious, as it leads to
>>>>data
>>>> loss.  Here is the situation:
>>>>
>>>> We have two systems, named ftp01 and ftp02.  They are both running
>>>>CentOS
>>>> 7.2, with this kernel release and ceph packages:
>>>>
>>>> kernel-3.10.0-327.18.2.el7.x86_64
>>>
>>>That is an old-ish kernel to be using with cephfs.  It may well be the
>>>source of your issues.
>>>
>>>> [root@ftp01 cron]# rpm -qa | grep ceph
>>>> ceph-base-10.2.1-0.el7.x86_64
>>>> ceph-deploy-1.5.33-0.noarch
>>>> ceph-mon-10.2.1-0.el7.x86_64
>>>> libcephfs1-10.2.1-0.el7.x86_64
>>>> ceph-selinux-10.2.1-0.el7.x86_64
>>>> ceph-mds-10.2.1-0.el7.x86_64
>>>> ceph-common-10.2.1-0.el7.x86_64
>>>> ceph-10.2.1-0.el7.x86_64
>>>> python-cephfs-10.2.1-0.el7.x86_64
>>>> ceph-osd-10.2.1-0.el7.x86_64
>>>>
>>>> Mounted like so:
>>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
>>>> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
>>>> And:
>>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
>>>> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0
>>>>
>>>> This filesystem has 234GB worth of data on it, and I created another
>>>> subdirectory and mounted it, NFS style.
>>>>
>>>> Here were the steps to reproduce:
>>>>
>>>> First, I created a file (I was mounting /var/spool/cron on two
>>>>systems)
>>>>on
>>>> ftp01:
>>>> (crond is not running right now on either system to keep the variables
>>>>down)
>>>>
>>>> [root@ftp01 cron]# cp /tmp/root .
>>>>
>>>> Shows up on both fine:
>>>> [root@ftp01 cron]# ls -la
>>>> total 2
>>>> drwx------   1 root root    0 Jun 15 15:50 .
>>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>>> -rw-------   1 root root 2043 Jun 15 15:50 root
>>>> [root@ftp01 cron]# md5sum root
>>>> 0636c8deaeadfea7b9ddaa29652b43ae  root
>>>>
>>>> [root@ftp02 cron]# ls -la
>>>> total 2
>>>> drwx------   1 root root 2043 Jun 15 15:50 .
>>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>>> -rw-------   1 root root 2043 Jun 15 15:50 root
>>>> [root@ftp02 cron]# md5sum root
>>>> 0636c8deaeadfea7b9ddaa29652b43ae  root
>>>>
>>>> Now, I vim the file on one of them:
>>>> [root@ftp01 cron]# vim root
>>>> [root@ftp01 cron]# ls -la
>>>> total 2
>>>> drwx------   1 root root    0 Jun 15 15:51 .
>>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>>> -rw-------   1 root root 2044 Jun 15 15:50 root
>>>> [root@ftp01 cron]# md5sum root
>>>> 7a0c346bbd2b61c5fe990bb277c00917  root
>>>>
>>>> [root@ftp02 cron]# md5sum root
>>>> 7a0c346bbd2b61c5fe990bb277c00917  root
>>>>
>>>> So far so good, right?  Then, a few seconds later:
>>>>
>>>> [root@ftp02 cron]# ls -la
>>>> total 0
>>>> drwx------   1 root root   0 Jun 15 15:51 .
>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 ..
>>>> -rw-------   1 root root   0 Jun 15 15:50 root
>>>> [root@ftp02 cron]# cat root
>>>> [root@ftp02 cron]# md5sum root
>>>> d41d8cd98f00b204e9800998ecf8427e  root
>>>>
>>>> And on ftp01:
>>>>
>>>> [root@ftp01 cron]# ls -la
>>>> total 2
>>>> drwx------   1 root root    0 Jun 15 15:51 .
>>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>>> -rw-------   1 root root 2044 Jun 15 15:50 root
>>>> [root@ftp01 cron]# md5sum root
>>>> 7a0c346bbd2b61c5fe990bb277c00917  root
>>>>
>>>> I later create a 'root2' on ftp02 and cause a similar issue.  The end
>>>> results are two non-matching files:
>>>>
>>>> [root@ftp01 cron]# ls -la
>>>> total 2
>>>> drwx------   1 root root    0 Jun 15 15:53 .
>>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>>> -rw-------   1 root root 2044 Jun 15 15:50 root
>>>> -rw-r--r--   1 root root    0 Jun 15 15:53 root2
>>>>
>>>> [root@ftp02 cron]# ls -la
>>>> total 2
>>>> drwx------   1 root root    0 Jun 15 15:53 .
>>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>>> -rw-------   1 root root    0 Jun 15 15:50 root
>>>> -rw-r--r--   1 root root 1503 Jun 15 15:53 root2
>>>>
>>>> We were able to reproduce this on two other systems with the same
>>>>cephfs
>>>> filesystem.  I have also seen cases where the file would just blank
>>>>out
>>>>on
>>>> both as well.
>>>>
>>>> We could not reproduce it with our dev/test cluster running the
>>>>development
>>>> ceph version:
>>>>
>>>> ceph-10.2.2-1.g502540f.el7.x86_64
>>>
>>>Strange.  In that cluster, was the same 3.x kernel in use?  There
>>>aren't a whole lot of changes on the server side in v10.2.2 that I
>>>could imagine affecting this case.
>>>
>>>The best thing to do right now is to try using ceph-fuse in your
>>>production environment, to check that it is not exhibiting the same
>>>behaviour as the old kernel client.  Once you confirm that, I would
>>>recommend upgrading your kernel to the most recent 4.x that you are
>>>comfortable with, and confirm that that also does not exhibit the bad
>>>behaviour.
>>>
>>>John
>>>
>>>> Is this a known bug with the current production Jewel release?  If so,
>>>>will
>>>> it be patched in the next release?
>>>>
>>>> Thank you very much,
>>>>
>>>> Jason Gress
>>>>
>>>> "This message and any attachments may contain confidential
>>>>information.
>>>>If
>>>> you
>>>> have received this  message in error, any use or distribution is
>>>>prohibited.
>>>> Please notify us by reply e-mail if you have mistakenly received this
>>>> message,
>>>> and immediately and permanently delete it and any attachments. Thank
>>>>you."
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>
>>
>>
>>
>> "This message and any attachments may contain confidential information.
>>If you
>> have received this  message in error, any use or distribution is
>>prohibited.
>> Please notify us by reply e-mail if you have mistakenly received this
>>message,
>> and immediately and permanently delete it and any attachments. Thank
>>you."
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




"This message and any attachments may contain confidential information. If you
have received this  message in error, any use or distribution is prohibited. 
Please notify us by reply e-mail if you have mistakenly received this message,
and immediately and permanently delete it and any attachments. Thank you."
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to